I am trying to create a loop in pandas to calculate difference between consecutive columns and give output in a new column:
Original df:
**201601** **201602** **201603**
100 200 500
Desired output
**201601** **201602** **201603** **201602_201601** **201603_02**
100 200 500 100 300
My code is which I had modified from a stackoverflow post ([ add columns to a data frame calculated by for loops in python ):
for i in df.iloc[:,2:5]:
for j in df.iloc[:,2:5]:
if i == j:
break
else:
bina = df[i]-df[j]
df['MOM_' + str(j) + '_' + str(i)] = bina
df.head()
However, the output I get is as below:
**201601** **201602** **201603** **201602_201601** **201603_201601** **201603_201602**
100 200 500 100 400 300
I have used pd.diff to do what I needed but couldn't figure out the for loop code. Any help would be greatly appreciated.
Thanks
Using diff
and simple list comprehension with zip
to construct the columns' names.
cols = [f'{b}_{a}' for (a,b) in zip(df.columns, df.columns[1:])]
df[cols] = df.diff(axis=1).dropna(axis=1)
201601 201602 201603 201602_201601 201603_201602
0 100 200 500 100 300
Avoid to use for
loops at all times when using pandas
This is just fix your code
col=df.columns
for x,i in enumerate(col):
for y,j in enumerate(col):
if y-x==1 and i!=j:
bina = df[i]-df[j]
df['MOM_' + str(j) + '_' + str(i)] = bina
df.columns
Out[1210]:
Index(['**201601**', '**201602**', '**201603**', 'MOM_**201602**_**201601**',
'MOM_**201603**_**201602**'],
dtype='object')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.