简体   繁体   English

在Pandas中查询MultiIndex DataFrame

[英]Querying MultiIndex DataFrame in Pandas

I have a DataFrame that looks like this: 我有一个看起来像这样的DataFrame:

FirstDF=
              C
A    B      
'a' 'blue'   43
    'green'  59
'b' 'red     56
'c' 'green'  80
    'orange' 72

Where A and B are set as indexes. 其中A和B被设置为索引。 I also have a DataFrame that looks like: 我还有一个看起来像这样的DataFrame:

SecondDF=

    A     B
0  'a'  'green'
1  'b'  'red'
2  'c'  'green'

Is there a way I can directly query the first DataFrame with the last one, and obtain an output like the following? 有没有办法可以直接用最后一个查询第一个DataFrame,并获得如下输出?

C
59
56
80

I did it by iterating over the second DataFrame, as shown below, but I would like to do it using pandas logic instead of for loops. 我通过迭代第二个DataFrame来做到这一点,如下所示,但我想用pandas逻辑而不是for循环来做。

data=[]
for i in range(SecondDF.shape[0]):
    data.append(FirstDF.loc[tuple(SecondDF.iloc[i])])
data=pd.Series(data)

Use merge with parameter left_index and right_on : 使用参数left_indexright_on merge

df = FirstDF.merge(SecondDF, left_index=True, right_on=['A','B'])['C'].to_frame()
print (df)
    C
0  59
1  56
2  80

Another solution with isin of MultiIndex es and filtering by boolean indexing : 与另一种溶液isinMultiIndex ES和通过滤波boolean indexing

mask = FirstDF.index.isin(SecondDF.set_index(['A','B']).index)
#alternative solution
#mask = FirstDF.index.isin(list(map(tuple,SecondDF[['A','B']].values.tolist())))
df = FirstDF.loc[mask, ['C']].reset_index(drop=True)
print (df)
    C
0  59
1  56
2  80

Detail : 细节

print (FirstDF.loc[mask, ['C']])
              C
A   B          
'a' 'green'  59
'b' 'red'    56
'c' 'green'  80

EDIT: 编辑:

You can use merge with outer join and indicator=True parameter, then filter by boolean indexing : 您可以使用merge with outer join和indicator=True参数,然后通过boolean indexing进行筛选:

df1=FirstDF.merge(SecondDF, left_index=True, right_on=['A','B'], indicator=True, how='outer')
print (df1)
    C    A         B     _merge
2  43  'a'    'blue'  left_only
0  59  'a'   'green'       both
1  56  'b'     'red'       both
2  80  'c'   'green'       both
2  72  'c'  'orange'  left_only

mask = df1['_merge'] != 'both'
df1 = df1.loc[mask, ['C']].reset_index(drop=True)
print (df1)
    C
0  43
1  72

For second solution invert boolen mask by ~ : 对于第二种解决方案,通过~反转boolen mask:

mask = FirstDF.index.isin(SecondDF.set_index(['A','B']).index)
#alternative solution
#mask = FirstDF.index.isin(list(map(tuple,SecondDF[['A','B']].values.tolist())))
df = FirstDF.loc[~mask, ['C']].reset_index(drop=True)
print (df)
    C
0  43
1  72
FirstDF.loc[zip(SecondDF['A'],SecondDF['B']),]

Explanation:- 说明:-

Idea is to get the indexes from second data frame and use them on first data frame. 想法是从第二个数据帧获取索引并在第一个数据帧上使用它们。 For multi-indexes you can pass the tuple of indexes to get the row. 对于多索引,您可以传递索引元组以获取行。

FirstDF.loc[('bar','two'),] 

will give you all the rows whose first index is 'bar and second index is 'two'. 将为您提供第一个索引为“bar”和“第二个索引为'2”的所有行。

FirstDF.loc[(SecondDF['A'],SecondDF['B']),] 

takes those indexes directly from SecondDF which you want but the catch is it will take all the combinations of 'A' and 'B'. 直接从你想要的SecondDF获取那些索引但是捕获它将采用'A'和'B'的所有组合。 So adding zip will take only the indexes which are part of same row in SecondDF 因此,添加zip将仅采用SecondDF中同一行的索引

You can use merge to get the result; 您可以使用merge来获得结果;

In [35]: df1
Out[35]:
   A       B   C
0  a    blue  43
1  a   green  59
2  b     red  56
3  c   green  80
4  c  orange  72

In [36]: df2
Out[36]:
   A      B
0  a  green
1  b    red
2  c  green

In [37]: pd.merge(df1, df2, on=['A', 'B'])['C']
Out[37]:
0    59
1    56
2    80
Name: C, dtype: int64

Ok I found an answer: 好的,我找到了答案:

tuple_list = list(map(tuple,SecondDF.values))
insDF = FirstDF.loc[tuple_list].dropna()
outsDF = FirstDF.loc[~FirstDF.index.isin(tuple_list)]

This gives both the values that are and the values that are not in FirstDF. 这给出了值和FirstDF中不存在的值。 The dropna method is used here because this querying leaves the values in SecondDF that are not in FirstDF as NaN, so they should be dropped. 这里使用了dropna方法,因为这个查询将SecondDF中不在FirstDF中的值保留为NaN,因此应该删除它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM