Using df.apply on a function with multiple inputs to generate multiple outputs

Question

I have a dataframe that looks like this

initial year0 year1
0       0     12
1       1     13
2       2     14
3       3     15

Note that the number of year columns year0, year1... (year_count) is completely variable but will be constant throughout this code

I first wanted to apply a function to each of the 'year' columns to generate 'mod' columns like so

def mod(year, scalar):
    return (year * scalar)

s = 5
year_count = 2
# Generate new columns
df[[f"mod{y}" for y in range (year_count)]] = df[[f"year{y}" for y in range(year_count)]].apply(mod, scalar=s)  

initial year0 year1 mod0 mod1
0       0     12    0   60
1       1     13    5   65
2       2     14    10  70
3       3     15    15  75

All good so far. The problem is that I now want to apply another function to both the year column and its corresponding mod column to generate another set of val columns, so something like

def sum_and_scale(year_col, mod_col, scale):
    return (year_col + mod_col) * scale

Then I apply this to each of the columns (year0, mod0), (year1, mod1) etc to generate the next tranche of columns.

With scale = 10 I should end up with

initial year0 year1 mod0 mod1 val0 val1
0       0     12    0    60   0    720
1       1     13    5    65   60   780
2       2     14    10   70   120  840
3       3     15    15   75   180  900

This is where I'm stuck - I don't know how to put two existing df columns together in a function with the same structure as in the first example, and if I do something like

df[['val0', 'val1']] = df['col1', 'col2'].apply(lambda x: sum_and_scale('mod0', 'mod1', scale=10))

I don't know how to generalise this to have arbitrary inputs and outputs and also apply the constant scale parameter. (I know the last piece of won't work but it's the other avenue to a solution I've seen)

The reason I'm asking is because I believe the loop that I currently have working is creating performance issues with the number of columns and the length of each column.

Thanks

Answer 1

IMHO, it's better with a simple for loop:

for i in range(2):
    df[f'val{i}'] = sum_and_scale(df[f'year{i}'], df[f'mod{i}'], scale=10)

Using df.apply on a function with multiple inputs to generate multiple outputs

Question

1 answers

solution1
0 2020-09-24 17:33:27

Using df.apply on a function with multiple inputs to generate multiple outputs

Question

1 answers

solution1 0 2020-09-24 17:33:27

solution1
0 2020-09-24 17:33:27