[英]Querying MultiIndex DataFrame in Pandas
I have a DataFrame that looks like this: 我有一个看起来像这样的DataFrame:
FirstDF=
C
A B
'a' 'blue' 43
'green' 59
'b' 'red 56
'c' 'green' 80
'orange' 72
Where A and B are set as indexes. 其中A和B被设置为索引。 I also have a DataFrame that looks like:
我还有一个看起来像这样的DataFrame:
SecondDF=
A B
0 'a' 'green'
1 'b' 'red'
2 'c' 'green'
Is there a way I can directly query the first DataFrame with the last one, and obtain an output like the following? 有没有办法可以直接用最后一个查询第一个DataFrame,并获得如下输出?
C
59
56
80
I did it by iterating over the second DataFrame, as shown below, but I would like to do it using pandas logic instead of for loops. 我通过迭代第二个DataFrame来做到这一点,如下所示,但我想用pandas逻辑而不是for循环来做。
data=[]
for i in range(SecondDF.shape[0]):
data.append(FirstDF.loc[tuple(SecondDF.iloc[i])])
data=pd.Series(data)
Use merge
with parameter left_index
and right_on
: 使用参数
left_index
和right_on
merge
:
df = FirstDF.merge(SecondDF, left_index=True, right_on=['A','B'])['C'].to_frame()
print (df)
C
0 59
1 56
2 80
Another solution with isin
of MultiIndex
es and filtering by boolean indexing
: 与另一种溶液
isin
的MultiIndex
ES和通过滤波boolean indexing
:
mask = FirstDF.index.isin(SecondDF.set_index(['A','B']).index)
#alternative solution
#mask = FirstDF.index.isin(list(map(tuple,SecondDF[['A','B']].values.tolist())))
df = FirstDF.loc[mask, ['C']].reset_index(drop=True)
print (df)
C
0 59
1 56
2 80
Detail : 细节 :
print (FirstDF.loc[mask, ['C']])
C
A B
'a' 'green' 59
'b' 'red' 56
'c' 'green' 80
EDIT: 编辑:
You can use merge
with outer join and indicator=True
parameter, then filter by boolean indexing
: 您可以使用
merge
with outer join和indicator=True
参数,然后通过boolean indexing
进行筛选:
df1=FirstDF.merge(SecondDF, left_index=True, right_on=['A','B'], indicator=True, how='outer')
print (df1)
C A B _merge
2 43 'a' 'blue' left_only
0 59 'a' 'green' both
1 56 'b' 'red' both
2 80 'c' 'green' both
2 72 'c' 'orange' left_only
mask = df1['_merge'] != 'both'
df1 = df1.loc[mask, ['C']].reset_index(drop=True)
print (df1)
C
0 43
1 72
For second solution invert boolen mask by ~
: 对于第二种解决方案,通过
~
反转boolen mask:
mask = FirstDF.index.isin(SecondDF.set_index(['A','B']).index)
#alternative solution
#mask = FirstDF.index.isin(list(map(tuple,SecondDF[['A','B']].values.tolist())))
df = FirstDF.loc[~mask, ['C']].reset_index(drop=True)
print (df)
C
0 43
1 72
FirstDF.loc[zip(SecondDF['A'],SecondDF['B']),]
Explanation:- 说明:-
Idea is to get the indexes from second data frame and use them on first data frame. 想法是从第二个数据帧获取索引并在第一个数据帧上使用它们。 For multi-indexes you can pass the tuple of indexes to get the row.
对于多索引,您可以传递索引元组以获取行。
FirstDF.loc[('bar','two'),]
will give you all the rows whose first index is 'bar and second index is 'two'. 将为您提供第一个索引为“bar”和“第二个索引为'2”的所有行。
FirstDF.loc[(SecondDF['A'],SecondDF['B']),]
takes those indexes directly from SecondDF which you want but the catch is it will take all the combinations of 'A' and 'B'. 直接从你想要的SecondDF获取那些索引但是捕获它将采用'A'和'B'的所有组合。 So adding zip will take only the indexes which are part of same row in SecondDF
因此,添加zip将仅采用SecondDF中同一行的索引
You can use merge to get the result; 您可以使用merge来获得结果;
In [35]: df1
Out[35]:
A B C
0 a blue 43
1 a green 59
2 b red 56
3 c green 80
4 c orange 72
In [36]: df2
Out[36]:
A B
0 a green
1 b red
2 c green
In [37]: pd.merge(df1, df2, on=['A', 'B'])['C']
Out[37]:
0 59
1 56
2 80
Name: C, dtype: int64
Ok I found an answer: 好的,我找到了答案:
tuple_list = list(map(tuple,SecondDF.values))
insDF = FirstDF.loc[tuple_list].dropna()
outsDF = FirstDF.loc[~FirstDF.index.isin(tuple_list)]
This gives both the values that are and the values that are not in FirstDF. 这给出了值和FirstDF中不存在的值。 The dropna method is used here because this querying leaves the values in SecondDF that are not in FirstDF as NaN, so they should be dropped.
这里使用了dropna方法,因为这个查询将SecondDF中不在FirstDF中的值保留为NaN,因此应该删除它们。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.