简体   繁体   English

我有两个大小不等的多索引数据框要比较

[英]I have two multiindexed dataframes of unequal size that I want to compare

I have two multiindexed-dataframes (df1 and df2) of similar shape yet different size.我有两个形状相似但大小不同的多索引数据帧(df1 和 df2)。

I wish to compare the two dataframe's columns and show the comparison on the bigger dataframe's scaffold with blanks in the rows that don't match.我希望比较两个数据框的列,并在较大的数据框的脚手架上显示比较,行中的空白不匹配。 - df1: - df1:

               col1
one two three      
a   1.0 abc       1
        mno       2
        xyz       3
    2.0 abc       4
        mno       5
        xyz       6
b   1.0 abc       7
        mno       8
        xyz       9
    2.0 abc      10
        mno      11
        xyz      12
  • df2: df2:
                0
one two three    
a   1.0 abc    18
        mno    18
        xyz    19
        lpq    18
    2.0 abc     7
        mno     4
        xyz    13
        lpq     8
b   1.0 abc     8
        mno     5
        xyz     4
        lpq    14
    2.0 abc    12
        mno    16
        xyz     6
        lpq     7
c   1.0 abc     5
        mno     0
        xyz     0
        lpq    19
    2.0 abc    14
        mno     7
        xyz     0
        lpq     6

I've already tried comparing the two dataframes using a simple difference with the hope that the third dataframe would contain empty rows where there's a mismatch but I ended up with a much bigger dataframe containing multiples of the same row and many empty ranges of rows.我已经尝试使用简单的差异比较两个数据帧,希望第三个数据帧包含不匹配的空行,但我最终得到了一个更大的数据帧,其中包含同一行的倍数和许多空行范围。

To recreate the dfs重新创建 dfs

import pandas as pd
import numpy as np


index_1 = pd.MultiIndex.from_product([['a','b'],[1.,2],['abc','mno','xyz']], names = ['one','two','three'])
df1 =  pd.DataFrame({'col1':[1,2,3,4,5,6,7,8,9,10,11,12]}, index = index_1)



index_2 = pd.MultiIndex.from_product([['a','b','c'],[1.,2],['abc','mno','xyz', 'lpq']], names = ['one','two','three'])
df2 =  pd.DataFrame(np.random.randint(0,20,size=(24, 1)), index = index_2)


My desired dataframe should look like the bigger df:我想要的数据框应该看起来像更大的 df:

                0
one two three    
a   1.0 abc    18
        mno    18
        xyz    19
        lpq     
    2.0 abc     7
        mno     4
        xyz    13
        lpq      
b   1.0 abc     8
        mno     5
        xyz     4
        lpq     
    2.0 abc    12
        mno    16
        xyz     6
        lpq      
c   1.0 abc     5
        mno     0
        xyz     0
        lpq     
    2.0 abc    14
        mno     7
        xyz     0
        lpq      

This problem has baffled me for days and I would appreciate any help.这个问题困扰了我好几天,我将不胜感激。

where + isin where + isin

df2.where(pd.Series(df2.index.isin(df1.index), 
                    index=df2.index))

Another way is to reindex twice (if the indices are unique):另一种方法是重新reindex两次(如果索引是唯一的):

df2.reindex(df1.index).reindex(df2.index)

                0
one two three      
a   1.0 abc    11.0
        mno     5.0
        xyz     8.0
        lpq     NaN
    2.0 abc     5.0
        mno     2.0
        xyz    19.0
        lpq     NaN
b   1.0 abc     5.0
        mno    19.0
        xyz    11.0
        lpq     NaN
    2.0 abc     2.0
        mno    13.0
        xyz    12.0
        lpq     NaN

You can replace the NaN with blanks, but I don't advise that since the column becomes object您可以用空格替换NaN ,但我不建议这样做,因为该列成为object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 比较 Pandas 中两个大小不等的 Dataframes 中的列以进行条件检查 - Compare columns in Pandas between two unequal size Dataframes for condition check 我有两个要在不更改索引顺序的情况下进行比较的数据框 - I have two dataframes I want to compare without changing the order of the index 如何合并两个大小不等的数据帧 - How to merge two dataframes of unequal size 附加两个多索引的pandas数据帧 - Append two multiindexed pandas dataframes 我如何(或应该)将此DataFrame的DataFrame转换为MultiIndexed DataFrame? - How can I (or should I) transform this DataFrame of DataFrames into a MultiIndexed DataFrame? 我有两个数据框,我需要将它们相互比较并计算一个度量 - I have two dataframes and I need to compare them to each other and calculate a measure 我有两个数据框。 我想将一个数据框的标题与另一数据框的一列的内容进行比较 - I have two dataframes. I wanted to compare header of one dataframe with the content of one column in another dataframe 加入两个不相等的数据帧 - Joining two unequal dataframes 如何基于行值合并两个大小不等的数据框 - How to merge two DataFrames of unequal size based on row value 比较两个大小不等的 numpy arrays 并用 nan 填充排除元素 - Compare two unequal size numpy arrays and fill the exclusion elements with nan
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM