我有两个大小不等的多索引数据框要比较

Question

I have two multiindexed-dataframes (df1 and df2) of similar shape yet different size.我有两个形状相似但大小不同的多索引数据帧（df1 和 df2）。

I wish to compare the two dataframe's columns and show the comparison on the bigger dataframe's scaffold with blanks in the rows that don't match.我希望比较两个数据框的列，并在较大的数据框的脚手架上显示比较，行中的空白不匹配。 - df1: - df1：

               col1
one two three      
a   1.0 abc       1
        mno       2
        xyz       3
    2.0 abc       4
        mno       5
        xyz       6
b   1.0 abc       7
        mno       8
        xyz       9
    2.0 abc      10
        mno      11
        xyz      12

df2: df2：

                0
one two three    
a   1.0 abc    18
        mno    18
        xyz    19
        lpq    18
    2.0 abc     7
        mno     4
        xyz    13
        lpq     8
b   1.0 abc     8
        mno     5
        xyz     4
        lpq    14
    2.0 abc    12
        mno    16
        xyz     6
        lpq     7
c   1.0 abc     5
        mno     0
        xyz     0
        lpq    19
    2.0 abc    14
        mno     7
        xyz     0
        lpq     6

I've already tried comparing the two dataframes using a simple difference with the hope that the third dataframe would contain empty rows where there's a mismatch but I ended up with a much bigger dataframe containing multiples of the same row and many empty ranges of rows.我已经尝试使用简单的差异比较两个数据帧，希望第三个数据帧包含不匹配的空行，但我最终得到了一个更大的数据帧，其中包含同一行的倍数和许多空行范围。

To recreate the dfs重新创建 dfs

import pandas as pd
import numpy as np


index_1 = pd.MultiIndex.from_product([['a','b'],[1.,2],['abc','mno','xyz']], names = ['one','two','three'])
df1 =  pd.DataFrame({'col1':[1,2,3,4,5,6,7,8,9,10,11,12]}, index = index_1)



index_2 = pd.MultiIndex.from_product([['a','b','c'],[1.,2],['abc','mno','xyz', 'lpq']], names = ['one','two','three'])
df2 =  pd.DataFrame(np.random.randint(0,20,size=(24, 1)), index = index_2)

My desired dataframe should look like the bigger df:我想要的数据框应该看起来像更大的 df：

                0
one two three    
a   1.0 abc    18
        mno    18
        xyz    19
        lpq     
    2.0 abc     7
        mno     4
        xyz    13
        lpq      
b   1.0 abc     8
        mno     5
        xyz     4
        lpq     
    2.0 abc    12
        mno    16
        xyz     6
        lpq      
c   1.0 abc     5
        mno     0
        xyz     0
        lpq     
    2.0 abc    14
        mno     7
        xyz     0
        lpq

This problem has baffled me for days and I would appreciate any help.这个问题困扰了我好几天，我将不胜感激。

Answer 1

where + isin where + isin

df2.where(pd.Series(df2.index.isin(df1.index), 
                    index=df2.index))

Another way is to reindex twice (if the indices are unique):另一种方法是重新reindex两次（如果索引是唯一的）：

df2.reindex(df1.index).reindex(df2.index)

                0
one two three      
a   1.0 abc    11.0
        mno     5.0
        xyz     8.0
        lpq     NaN
    2.0 abc     5.0
        mno     2.0
        xyz    19.0
        lpq     NaN
b   1.0 abc     5.0
        mno    19.0
        xyz    11.0
        lpq     NaN
    2.0 abc     2.0
        mno    13.0
        xyz    12.0
        lpq     NaN

You can replace the NaN with blanks, but I don't advise that since the column becomes object您可以用空格替换NaN ，但我不建议这样做，因为该列成为object

我有两个大小不等的多索引数据框要比较

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-08-29 16:52:11

我有两个大小不等的多索引数据框要比较

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-08-29 16:52:11

解决方案1
1 已采纳 2019-08-29 16:52:11