I have a problem similar to Merge two dataframes with multi-index 。
in:
import pandas as pd
import numpy as np
row_x1 = ['a1','b1','c1']
row_x2 = ['a2','b2','c2']
row_x3 = ['a3','b3','c3']
row_x4 = ['a4','b4','c4']
index_arrays = [np.array(['first', 'first', 'second', 'second']), np.array(['one','two','one','two'])]
df1 = pd.DataFrame([row_x1,row_x2,row_x3,row_x4], columns=list('ABC'), index=index_arrays)
print(df1)
out:
A B C
first one a1 b1 c1
two a2 b2 c2
second one a3 b3 c3
two a4 b4 c4
in:
row_y1 = ['d1','e1','f1']
row_y2 = ['d2','e2','f2']
row_y3 = ['d3','e3','f3']
index_arrays = [np.array(['first','first', 'second',]), np.array(['one','three','two'])]
df2 = pd.DataFrame([row_y1,row_y2,row_y3], columns=list('DEF'), index=index_arrays)
print(df2)
out:
D E F
first one d1 e1 f1
three d2 e2 f2
second two d3 e3 f3
in other words, how can I merge them to achieve df3 (as follows)?
in:
row_x1 = ['a1','b1','c1']
row_x2 = ['a2','b2','c2']
row_x3 = ['a3','b3','c3']
row_x4 = ['a4','b4','c4']
row_y1 = ['d1','e1','f1']
row_y2 = ['d2','e2','f2']
row_y3 = ['d3','e3','f3']
row_z1 = row_x1 + row_y1
row_z2 = row_x2 + [np.nan, np.nan, np.nan]
row_z3 = [np.nan, np.nan, np.nan] + row_y2
row_z4 = row_x3 + [np.nan, np.nan, np.nan]
row_z5 = row_x4 + row_y3
index_arrays = [np.array(['first', 'first', 'first', 'second', 'second']), np.array(['one','two','three','one','two'])]
df3 = pd.DataFrame([row_z1,row_z2,row_z3,row_z4,row_z5], columns=list('ABCDEF'), index=index_arrays)
print(df3)
out:
A B C D E F
first one a1 b1 c1 d1 e1 f1
two a2 b2 c2 NaN NaN NaN
three NaN NaN NaN d2 e2 f2
second one a3 b3 c3 NaN NaN NaN
two a4 b4 c4 d3 e3 f3
PS. thanks @Andreuccio for his/her question!
thanks @Ajay Verma and @EBDS. that is indeed solutions for manually created df data. But I am very confused about the following situation:
I have two dataframe from statistics. Then I copied the corresponding data for pd.merge()
in:
df1 = data1[data1.index.get_level_values(0) == 'BASIC_GZAG_TMB'].copy()
out:
0 1 2 3
BASIC_GZAG_TMB 1 127.0 179.0 190.0 239.0
2 38.0 23.0 21.0 29.0
3 37.0 27.0 32.0 37.0
4 5.0 14.0 11.0 23.0
5 31.0 56.0 41.0 65.0
7 389.0 258.0 337.0 243.0
NaN 1323.0 1388.0 1307.0 1311.0
in:
df2 = data2[data2.index.get_level_values(0) == 'BASIC_GZAG_TMB'].copy()
out:
0 1 2 3
BASIC_GZAG_TMB 1 207.0 232.0 252.0 223.0
2 26.0 18.0 19.0 20.0
3 43.0 41.0 50.0 42.0
4 35.0 27.0 37.0 15.0
5 54.0 52.0 78.0 64.0
6 1.0 1306.0 1.0 4.0
7 206.0 263.0 227.0 230.0
NaN 1374.0 1306.0 1282.0 1348.0
Then I merged df1 and df2 by:
df1.merge(df2, left_index=True, right_index=True, how='outer')
out:
0_x 1_x 2_x 3_x 0_y 1_y 2_y \
BASIC_GZAG_TMB 1 127.0 179.0 190.0 239.0 207.0 232.0 252.0
2 38.0 23.0 21.0 29.0 26.0 18.0 19.0
3 37.0 27.0 32.0 37.0 43.0 41.0 50.0
4 5.0 14.0 11.0 23.0 35.0 27.0 37.0
5 31.0 56.0 41.0 65.0 54.0 52.0 78.0
7 389.0 258.0 337.0 243.0 206.0 263.0 227.0
NaN 1323.0 1388.0 1307.0 1311.0 1374.0 1306.0 1282.0
3_y
BASIC_GZAG_TMB 1 223.0
2 20.0
3 42.0
4 15.0
5 64.0
7 230.0
NaN 1348.0
I am confused about the index of 6 which exists in df2 disappeared in result.
I know if i use df2.merge(df1...) can be a solution. But in fact, the data1 and data2 ware dynamically generated, I don't know which one has more indexs. I just want to get the union of df1 and df2.
You can use Pandas merge
for it. Link to documnetation: link
df = df1.merge(df2, left_index=True, right_index=True, how='outer')
print(df)
Output
A B C D E F
first one a1 b1 c1 d1 e1 f1
three NaN NaN NaN d2 e2 f2
two a2 b2 c2 NaN NaN NaN
second one a3 b3 c3 NaN NaN NaN
two a4 b4 c4 d3 e3 f3
If you need to sort according to numeric words... one, two, three...
Code:
from number_parser import parse
dfx = (
df1.merge(df2,left_index=True,right_index=True,how='outer')
.sort_index(key=lambda x: np.vectorize(parse)(x).astype(float)) )
Another example:
You may need to install the number_parse:
!pip install number_parser
Update:
As I dont' have the new data, I use the original data to test the "missing 6". I've also changed the column names to be the same, and added a nan index.
data1 = df1.copy(deep=True)
data2 = df2.copy(deep=True)
df1 = data1[data1.index.get_level_values(0) == 'first'].copy()
df2 = data2[data2.index.get_level_values(0) == 'first'].copy()
dfx = df1.merge(df2, left_index=True, right_index=True, how='outer').sort_index(
key=lambda x: np.vectorize(parse)(x)
)
As you can see, it's not missing any of the values. The problem probably does not lie in the merge part and need to inspect the source data which give rise to the situation.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.