简体   繁体   中英

How to insert new rows in a multi index dataframe from a reference dataframe

I have seen several posts about this but I could not get my head around how merge, join and concat using another dataframe. How can I fill the initial df1 [multi index dataframe] with reference to another dataframe d2. and fill in the all rows which are not present in the df1 for each level.

df1

date      dispatch_no  A   B   C
2019-12-2   1          a1  b1  c1
            2          a2  b2  c2
            5          a5  b5  c5
2019-12-2   1          d1  e1  f1
            3          d3  e3  f3

reference dataframe d2

dispatch_no  M   N   O  
1            M1  N1  O1
2            M2  N2  O2
3            M3  N3  O3             
4            M4  N4  O4
5            M5  N5  O5

expected output

date      dispatch_no  A   B   C
2019-12-2   1          a1  b1  c1
            2          a2  b2  c2
            3          M3  N3  O3  
            4          M4  N4  O4
            5          a5  b5  c5
2019-12-2   1          d1  e1  f1
            2          M2  N2  O2
            3          d3  e3  f3
            4          M4  N4  O4
            5          M5  N5  O5

Use:

df1_unstack=df1.unstack('date')
new_df=( df1_unstack.reindex(index=list(range(df1_unstack.index.min(),
                                     df1_unstack.index.max()+1)))
           .stack(dropna=False)
           .swaplevel()
           .sort_index())

df_fill=df2.set_index('dispatch_no')
df_fill.columns=new_df.columns
new_df=new_df.fillna(df_fill)

print(new_df)


                        A   B   C
date      dispatch_no            
2019-12-2 1            a1  b1  c1
          2            a2  b2  c2
          3            M3  N3  O3
          4            M4  N4  O4
          5            a5  b5  c5
2019-12-3 1            d1  e1  f1
          2            M2  N2  O2
          3            d3  e3  f3
          4            M4  N4  O4
          5            M5  N5  O5

Dataframes

print(df1)

                        A   B   C
date      dispatch_no            
2019-12-2 1            a1  b1  c1
          2            a2  b2  c2
          5            a5  b5  c5
2019-12-3 1            d1  e1  f1
          3            d3  e3  f3

print(df2)
   dispatch_no   M   N   O
0            1  M1  N1  O1
1            2  M2  N2  O2
2            3  M3  N3  O3
3            4  M4  N4  O4
4            5  M5  N5  O5

Based on the above data provided by @ansev , here is another way(use df2=df2.set_index('dispatch_no') if dispatch_no is not an index):

c=df1.index.get_level_values(0).unique()      #['2019-12-2', '2019-12-3']
m=pd.concat([df2]*len(c))                     #multiply the df to the length of c
idx=pd.MultiIndex.from_product([c,df2.index]) #create a multiindex
m.index=idx                                   #assign to m and finally reindex

then use combine_first()

final=df1.reindex(idx).combine_first(m.rename(columns=dict(zip(m.columns,df1.columns))))

              A   B   C
2019-12-2 1  a1  b1  c1
          2  a2  b2  c2
          3  M3  N3  O3
          4  M4  N4  O4
          5  a5  b5  c5
2019-12-3 1  d1  e1  f1
          2  M2  N2  O2
          3  d3  e3  f3
          4  M4  N4  O4
          5  M5  N5  O5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM