[英]I have two multiindexed dataframes of unequal size that I want to compare
I have two multiindexed-dataframes (df1 and df2) of similar shape yet different size.我有两个形状相似但大小不同的多索引数据帧(df1 和 df2)。
I wish to compare the two dataframe's columns and show the comparison on the bigger dataframe's scaffold with blanks in the rows that don't match.我希望比较两个数据框的列,并在较大的数据框的脚手架上显示比较,行中的空白不匹配。 - df1:
- df1:
col1
one two three
a 1.0 abc 1
mno 2
xyz 3
2.0 abc 4
mno 5
xyz 6
b 1.0 abc 7
mno 8
xyz 9
2.0 abc 10
mno 11
xyz 12
0
one two three
a 1.0 abc 18
mno 18
xyz 19
lpq 18
2.0 abc 7
mno 4
xyz 13
lpq 8
b 1.0 abc 8
mno 5
xyz 4
lpq 14
2.0 abc 12
mno 16
xyz 6
lpq 7
c 1.0 abc 5
mno 0
xyz 0
lpq 19
2.0 abc 14
mno 7
xyz 0
lpq 6
I've already tried comparing the two dataframes using a simple difference with the hope that the third dataframe would contain empty rows where there's a mismatch but I ended up with a much bigger dataframe containing multiples of the same row and many empty ranges of rows.我已经尝试使用简单的差异比较两个数据帧,希望第三个数据帧包含不匹配的空行,但我最终得到了一个更大的数据帧,其中包含同一行的倍数和许多空行范围。
To recreate the dfs重新创建 dfs
import pandas as pd
import numpy as np
index_1 = pd.MultiIndex.from_product([['a','b'],[1.,2],['abc','mno','xyz']], names = ['one','two','three'])
df1 = pd.DataFrame({'col1':[1,2,3,4,5,6,7,8,9,10,11,12]}, index = index_1)
index_2 = pd.MultiIndex.from_product([['a','b','c'],[1.,2],['abc','mno','xyz', 'lpq']], names = ['one','two','three'])
df2 = pd.DataFrame(np.random.randint(0,20,size=(24, 1)), index = index_2)
My desired dataframe should look like the bigger df:我想要的数据框应该看起来像更大的 df:
0
one two three
a 1.0 abc 18
mno 18
xyz 19
lpq
2.0 abc 7
mno 4
xyz 13
lpq
b 1.0 abc 8
mno 5
xyz 4
lpq
2.0 abc 12
mno 16
xyz 6
lpq
c 1.0 abc 5
mno 0
xyz 0
lpq
2.0 abc 14
mno 7
xyz 0
lpq
This problem has baffled me for days and I would appreciate any help.这个问题困扰了我好几天,我将不胜感激。
where
+ isin
where
+ isin
df2.where(pd.Series(df2.index.isin(df1.index),
index=df2.index))
Another way is to reindex
twice (if the indices are unique):另一种方法是重新
reindex
两次(如果索引是唯一的):
df2.reindex(df1.index).reindex(df2.index)
0
one two three
a 1.0 abc 11.0
mno 5.0
xyz 8.0
lpq NaN
2.0 abc 5.0
mno 2.0
xyz 19.0
lpq NaN
b 1.0 abc 5.0
mno 19.0
xyz 11.0
lpq NaN
2.0 abc 2.0
mno 13.0
xyz 12.0
lpq NaN
You can replace the NaN
with blanks, but I don't advise that since the column becomes object
您可以用空格替换
NaN
,但我不建议这样做,因为该列成为object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.