[英]What is the most pythonic way to apply a function on and return multiple columns?
While using Pandas, I often encounter a case where there is an existing function which takes in multiple arguments and returns multiple values: 在使用Pandas时,我经常会遇到一个现有函数,它接受多个参数并返回多个值:
def foo(val_a, val_b):
"""
Some example function that takes in and returns multiple values.
Can be a lot more complex.
"""
sm = val_a + val_b
sb = val_a - val_b
mt = val_a * val_b
dv = val_a / val_b
return sm, sb, mt, dv
Suppose I have a dataframe: 假设我有一个数据帧:
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4], [5, 6], [7, 8]])
df
Out[6]:
0 1
0 1 2
1 3 4
2 5 6
3 7 8
What I want is to apply foo
on df
with column 0 and 1 as arguments, and put the results into new columns of df
, without modifying foo
, like this: 我想要的是在
df
上将foo
应用于第0列和第1列作为参数,并将结果放入df
新列中, 而不修改foo
,如下所示:
df_out
Out[7]:
0 1 su sb mt dv
0 1 2 3 -1 2 0.5
1 3 4 7 -1 12 0.75
2 5 6 11 -1 30 0.833
3 7 8 15 -1 56 0.875
What is the most pythonic way to achieve this? 实现这一目标的最pythonic方法是什么?
>>> pd.concat([df, df.from_records(foo(df[0], df[1])).T], axis=1)
0 1 0 1 2 3
0 1 2 3.0 -1.0 2.0 0.500000
1 3 4 7.0 -1.0 12.0 0.750000
2 5 6 11.0 -1.0 30.0 0.833333
3 7 8 15.0 -1.0 56.0 0.875000
Speed: 1.13 ms per loop 速度:每循环1.13 ms
If you care about speed this is superior to using apply
and gives your desired output. 如果您关心速度,这优于使用
apply
并提供您想要的输出。
>>> pd.concat([df, df.from_records(np.vectorize(foo)(df[0], df[1])).T], axis=1)
Speed: 728 µs per loop 速度:每回路728μs
#apply function foo and generate a DF using return values and then merge into existing DF.
merged = pd.merge(df,df.apply(lambda x: pd.Series(foo(x[0],x[1])),axis=1),left_index=True,right_index=True)
#change column names.
merged.columns=[0,1,'sm','sb','mt','dv']
merged
Out[1478]:
0 1 sm sb mt dv
0 1 2 3.0 -1.0 2.0 0.500000
1 3 4 7.0 -1.0 12.0 0.750000
2 5 6 11.0 -1.0 30.0 0.833333
3 7 8 15.0 -1.0 56.0 0.875000
You can use apply
+ DataFrame
constructor: 您可以使用
apply
+ DataFrame
构造函数:
cols = ['sm','sb','mt','dv']
df[cols] = pd.DataFrame(df.apply(lambda x: foo(x[0], x[1]), 1).values.tolist(),columns= cols)
print (df)
0 1 sm sb mt dv
0 1 2 3 -1 2 0.500000
1 3 4 7 -1 12 0.750000
2 5 6 11 -1 30 0.833333
3 7 8 15 -1 56 0.875000
Solution with concat
解决方案与
concat
cols = ['sm','sb','mt','dv']
df[cols] = pd.concat(foo(df[0], df[1]), axis=1, keys=cols)
print (df)
0 1 sm sb mt dv
0 1 2 3 -1 2 0.500000
1 3 4 7 -1 12 0.750000
2 5 6 11 -1 30 0.833333
3 7 8 15 -1 56 0.875000
Also is possible create new DataFrame
and then concat
original: 也可以创建新的
DataFrame
然后concat
原始:
cols = ['sm','sb','mt','dv']
df1 = pd.concat(foo(df[0], df[1]), axis=1, keys=cols)
print (df1)
sm sb mt dv
0 3 -1 2 0.500000
1 7 -1 12 0.750000
2 11 -1 30 0.833333
3 15 -1 56 0.875000
df = pd.concat([df, df1], axis=1)
print (df)
0 1 sm sb mt dv
0 1 2 3 -1 2 0.500000
1 3 4 7 -1 12 0.750000
2 5 6 11 -1 30 0.833333
3 7 8 15 -1 56 0.875000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.