[英]iterating through pandas dataframe and create new columns using custom function
I have a pandas dataframe which (obviously) contain some data.我有一个 pandas dataframe (显然)包含一些数据。 I have created a function that outputs a number new columns.
我创建了一个输出一些新列的 function。 How can I iterate or apply that function?
如何迭代或应用 function?
I have created a minimum example below ( not the actual problem), with a dataframe and function.我在下面创建了一个最小示例(不是实际问题),带有 dataframe 和 function。
EDIT: Think of the function as a "black box".编辑:将 function 视为“黑匣子”。 We don't now what is in, but based on the input it returns a dataframe, that should be added to the existing dataframe.
我们现在不知道里面有什么,但根据输入它返回一个 dataframe,它应该添加到现有的 dataframe 中。
import pandas as pd
a=pd.DataFrame({"input1": ["a","b"], "input2":[3,2]})
input1 input2
0 a 3
1 b 2
def f(i1,i2):
return(pd.DataFrame([{"repeat" : [i1]*i2, "square":i2**2 }]))
So in this case the function returns two new columns "repeat" and "square"所以在这种情况下,function 返回两个新列“repeat”和“square”
f(a.iloc[0,0],a.iloc[0,1])
repeat square
0 [a, a, a] 9
f(a.iloc[1,0],a.iloc[1,1])
repeat square
0 [b, b] 4
What I would like to end up with a data frame like this我想以这样的数据框结束
input1 input2 repeat square
0 a 3 [a, a, a] 9
1 b 2 [b, b] 4
Does anyone have an elegant solution to this?有没有人对此有一个优雅的解决方案?
I'd do it using assign
:我会使用
assign
来做到这一点:
a = a.assign(
repeat = a['input1'].repeat(a['input2']).groupby(level=0).agg(list),
square = np.square(a['input2']),
)
Output: Output:
>>> a
input1 input2 repeat square
0 a 3 [a, a, a] 9
1 b 2 [b, b] 4
You can try this modification of the f
function:您可以尝试对
f
function 进行此修改:
import pandas as pd
def f(df, col1, col2):
df_ = pd.DataFrame([{"repeat": list(df[col1] * df[col2])}]).explode("repeat", ignore_index=True)
df_["square"] = list(df[col2] ** 2)
return pd.concat([df, df_], axis=1)
a = pd.DataFrame({"input1": ["a", "b"], "input2": [3, 2]})
f(a, "input1", "input2")
How about using pd.concat
?使用
pd.concat
怎么样?
generated_df = pd.concat([f(*args) for args in a.to_numpy()], ignore_index=True)
out = pd.concat([a, generated_df], axis=1)
Output: Output:
>>> out
input1 input2 repeat square
0 a 3 [a, a, a] 9
1 b 2 [b, b] 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.