I have a pandas dataframes as below.
import pandas as pd
df1=pd.DataFrame({'Group1':['a','b','c','d','e'],'Group2':["f","g","h","i","j"],'Group3':['k','L','m','n',"0"]})
print(df1)
df2=pd.DataFrame({'Group1':[0,0,2,1,0],'Group2':[1,2,0,0,0],'Group3':[0,0,0,1,1]})
print(df2)
df1 is a dataframe of students' name. df2 is a dataframe of their test score.
For example, if I compare df1 and df2, g and c have test score 2.
n,o,f,d have value 1.
Rest of them have value 0.
I want to compare df1 and df2, and then extract students' name and score.
The expected output dataframe is like below.
df3=pd.DataFrame({'2':["g","c","NaN","NaN","NaN","NaN","NaN","NaN","NaN"],'1':["n","o","f","d","NaN","NaN","NaN","NaN","NaN"],'0':["k","L","m","h","i","j","a","b","e"]})
print(df3)
My actual dataframe is much larger than above samples and wonder there is any good way of extraction without specifying cells. Your help will be greatly appreciated. Thank you.
This is pivot
problem after melt
, I am using crosstab
, If you want to know more , please check this
i=df2.melt().groupby('value').cumcount()
c=df2.melt().value
v=df1.melt().value
pd.crosstab(index=i,columns=c,values=v,aggfunc='sum')
value 0 1 2
row_0
0 a d c
1 b f g
2 e n None
3 h 0 None
4 i None None
5 j None None
6 k None None
7 L None None
8 m None None
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.