在具有多个输入的函数上使用 df.apply 以生成多个输出

Question

I have a dataframe that looks like this我有一个看起来像这样的数据框

initial year0 year1
0       0     12
1       1     13
2       2     14
3       3     15

Note that the number of year columns year0, year1... (year_count) is completely variable but will be constant throughout this code请注意，年份列的数量 year0, year1... (year_count) 是完全可变的，但在整个代码中将保持不变

I first wanted to apply a function to each of the 'year' columns to generate 'mod' columns like so我首先想对每个“年份”列应用一个函数来生成像这样的“mod”列

def mod(year, scalar):
    return (year * scalar)

s = 5
year_count = 2
# Generate new columns
df[[f"mod{y}" for y in range (year_count)]] = df[[f"year{y}" for y in range(year_count)]].apply(mod, scalar=s)  

initial year0 year1 mod0 mod1
0       0     12    0   60
1       1     13    5   65
2       2     14    10  70
3       3     15    15  75

All good so far.到目前为止一切都很好。 The problem is that I now want to apply another function to both the year column and its corresponding mod column to generate another set of val columns, so something like问题是我现在想对 year 列及其相应的 mod 列应用另一个函数来生成另一组 val 列，因此类似于

def sum_and_scale(year_col, mod_col, scale):
    return (year_col + mod_col) * scale

Then I apply this to each of the columns (year0, mod0), (year1, mod1) etc to generate the next tranche of columns.然后我将其应用于每一列 (year0, mod0), (year1, mod1) 等以生成下一批列。

With scale = 10 I should end up with规模= 10我应该结束

initial year0 year1 mod0 mod1 val0 val1
0       0     12    0    60   0    720
1       1     13    5    65   60   780
2       2     14    10   70   120  840
3       3     15    15   75   180  900

This is where I'm stuck - I don't know how to put two existing df columns together in a function with the same structure as in the first example, and if I do something like这就是我卡住的地方 - 我不知道如何将两个现有的 df 列放在一个与第一个示例具有相同结构的函数中，如果我执行类似的操作

df[['val0', 'val1']] = df['col1', 'col2'].apply(lambda x: sum_and_scale('mod0', 'mod1', scale=10))

I don't know how to generalise this to have arbitrary inputs and outputs and also apply the constant scale parameter.我不知道如何将其概括为具有任意输入和输出并应用恒定比例参数。 (I know the last piece of won't work but it's the other avenue to a solution I've seen) （我知道最后一条不起作用，但这是我见过的解决方案的另一条途径）

The reason I'm asking is because I believe the loop that I currently have working is creating performance issues with the number of columns and the length of each column.我问的原因是因为我相信我目前正在使用的循环会产生列数和每列长度的性能问题。

Thanks谢谢

Answer 1

IMHO, it's better with a simple for loop:恕我直言，最好使用简单的for循环：

for i in range(2):
    df[f'val{i}'] = sum_and_scale(df[f'year{i}'], df[f'mod{i}'], scale=10)

在具有多个输入的函数上使用 df.apply 以生成多个输出

问题描述

1 个解决方案

解决方案1
0 2020-09-24 17:33:27

在具有多个输入的函数上使用 df.apply 以生成多个输出

问题描述

1 个解决方案

解决方案1 0 2020-09-24 17:33:27

解决方案1
0 2020-09-24 17:33:27