How can I use better solution instead of following codes? in big data set with lots of columns this code takes too much time
import pandas as pd
df = pd.DataFrame({'Jan':[10,20], 'Feb':[3,5],'Mar':[30,4],'Month':
[3,2],'Year':[2016,2016]})
# Jan Feb Mar Month Year
# 0 10 3 30 3 2016
# 1 20 5 4 2 2016
df1['Antal_1']= np.nan
df1['Antal_2']= np.nan
for i in range(len(df)):
if df['Yaer'][i]==2016:
df['Antal_1'][i]=df.iloc[i,df['Month'][i]-1]
df['Antal_2'][i]=df.iloc[i,df['Month'][i]-2]
else:
df['Antal_1'][i]=df.iloc[i,-1]
df['Antal_2'][i]=df.iloc[i,-2]
df
# Jan Feb Mar Month Year Antal_1 Antal_2
# 0 10 3 30 3 2016 30 3
# 1 20 5 4 2 2016 5 20
You should see a marginal speed-up by using df.apply
instead of iterating rows:
import pandas as pd
df = pd.DataFrame({'Jan': [10, 20], 'Feb': [3, 5], 'Mar': [30, 4],
'Month': [3, 2],'Year': [2016, 2016]})
df = df[['Jan', 'Feb', 'Mar', 'Month', 'Year']]
def calculator(row):
m1 = row['Month']
m2 = row.index.get_loc('Month')
return (row[int(m1-1)], row[int(m1-2)]) if row['Year'] == 2016 \
else (row[m2-1], row[m2-2])
df['Antal_1'], df['Antal_2'] = list(zip(*df.apply(calculator, axis=1)))
# Jan Feb Mar Month Year Antal_1 Antal_2
# 0 10 3 30 3 2016 30 3
# 1 20 5 4 2 2016 5 20
It's not clear to me what you want to do in the case of the year not being 2016, so I've made the value 100. Show an example and I can finish it. If it's just NaNs, then you can remove the first two lines from below.
df['Antal_1'] = 100
df['Antal_2'] = 100
df.loc[df['Year']==2016, 'Antal_1'] = df[df.columns[df.columns.get_loc("Month")-1]]
df.loc[df['Year']==2016, 'Antal_2'] = df[df.columns[df.columns.get_loc("Month")-2]]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.