Given two multiindex dataframes (df1 and df2), I want to group df1 and do a transformation. In this transformation, I want to add the corresponding array from df2 to df1.
import pandas and pd
import numpy as np
def do_transform(x):
return np.add(x, df2.ix(_index_of_x, _column_name_of_x))
df1.groupby(level=[0,1]).transform(do_transform)
How to retrieve the index and column name in Pandas transform?
EDIT:
df1 and df2 have the row size, but df2 contains more columns.
I think a join across multiindex levels might be better?
Anyhow proceeding with a transform; to scari's question i will assume same size.
"""
# data1.csv
alpha,beta,gamma
A,1,2
A,1,4
A,2,6
B,3,8
B,3,10
B,4,12
# data2.csv
alpha,beta,gamma
A,1,20
A,1,40
A,2,60
B,3,80
B,3,100
B,4,120
"""
df1 = pd.read_csv('data1.csv')
df2 = pd.read_csv('data2.csv')
df1.set_index(['alpha','beta'],inplace=True)
df2.set_index(['alpha','beta'],inplace=True)
def do_transform(x):
return x + df2.loc[df2.index.isin(x.index)]
print df1.groupby(level=[0,1]).transform(lambda x: do_transform(x)).head(len(df1))
which will produce
gamma
alpha beta
A 1 22
1 44
2 66
B 3 88
3 110
4 132
And if you have more than one column it works fine.
import pandas as pd
import numpy as np
"""
# data1.csv
alpha,beta,gamma,omega
A,1,2,1
A,1,4,1
A,2,6,1
B,3,8,1
B,3,10,1
B,4,12,1
# data2.csv
alpha,beta,gamma,omega
A,1,20,2
A,1,40,2
A,2,60,2
B,3,80,2
B,3,100,2
B,4,120,2
"""
df1 = pd.read_csv('data1.csv')
df2 = pd.read_csv('data2.csv')
df1.set_index(['alpha','beta'],inplace=True)
df2.set_index(['alpha','beta'],inplace=True)
def do_transform(x):
return x + df2.loc[x.index.unique(),:]
print df1.groupby(level=[0,1]).transform(lambda x: do_transform(x)).head(len(df1))
produces:
gamma omega
alpha beta
A 1 22 3
1 44 3
2 66 3
B 3 88 3
3 110 3
4 132 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.