How to efficiently add multiple columns to pandas data frame with values that depend on other dynamic columns

Question

How can I use better solution instead of following codes? in big data set with lots of columns this code takes too much time

import pandas as pd

df = pd.DataFrame({'Jan':[10,20], 'Feb':[3,5],'Mar':[30,4],'Month':
                   [3,2],'Year':[2016,2016]})

#     Jan   Feb   Mar    Month  Year
# 0   10     3    30     3      2016
# 1   20     5    4      2      2016

df1['Antal_1']= np.nan
df1['Antal_2']= np.nan

for i in range(len(df)):
    if df['Yaer'][i]==2016:
        df['Antal_1'][i]=df.iloc[i,df['Month'][i]-1]
        df['Antal_2'][i]=df.iloc[i,df['Month'][i]-2]
    else:
        df['Antal_1'][i]=df.iloc[i,-1]
        df['Antal_2'][i]=df.iloc[i,-2]
df
#     Jan   Feb   Mar    Month  Year  Antal_1  Antal_2
# 0   10     3    30     3      2016    30       3
# 1   20     5    4      2      2016    5       20

Answer 1

You should see a marginal speed-up by using df.apply instead of iterating rows:

import pandas as pd

df = pd.DataFrame({'Jan': [10, 20], 'Feb': [3, 5], 'Mar': [30, 4],
                   'Month': [3, 2],'Year': [2016, 2016]})

df = df[['Jan', 'Feb', 'Mar', 'Month', 'Year']]

def calculator(row):
    m1 = row['Month']
    m2 = row.index.get_loc('Month')
    return (row[int(m1-1)], row[int(m1-2)]) if row['Year'] == 2016 \
           else (row[m2-1], row[m2-2])

df['Antal_1'], df['Antal_2'] = list(zip(*df.apply(calculator, axis=1)))

#    Jan  Feb  Mar  Month  Year  Antal_1  Antal_2
# 0   10    3   30      3  2016       30        3
# 1   20    5    4      2  2016        5       20

Answer 2

It's not clear to me what you want to do in the case of the year not being 2016, so I've made the value 100. Show an example and I can finish it. If it's just NaNs, then you can remove the first two lines from below.

df['Antal_1'] = 100
df['Antal_2'] = 100
df.loc[df['Year']==2016, 'Antal_1'] = df[df.columns[df.columns.get_loc("Month")-1]]
df.loc[df['Year']==2016, 'Antal_2'] = df[df.columns[df.columns.get_loc("Month")-2]]

How to efficiently add multiple columns to pandas data frame with values that depend on other dynamic columns

Question

2 answers

solution1
1 ACCPTED 2018-02-15 23:38:36

solution2
0 2018-02-15 20:56:00

How to efficiently add multiple columns to pandas data frame with values that depend on other dynamic columns

Question

2 answers

solution1 1 ACCPTED 2018-02-15 23:38:36

solution2 0 2018-02-15 20:56:00

solution1
1 ACCPTED 2018-02-15 23:38:36

solution2
0 2018-02-15 20:56:00