Pandas Multi-index.isin 所有索引列

Question

我有一个 dataframe 和一个值列表。

车辆= ['van','car','bike']

一个	b	c
车	狗	1
面包车	自行车	2
面包车	车	3
狗	面包车	4

要查找两个值都是vehicle的所有行，我可以执行以下操作

toy_df[ ( toy_df['a'].isin(vehicles) ) & ( toy_df['b'].isin(vehicles) ) ]

...这工作正常。 但是我的数据要大得多，我认为使用索引会更有效。 如果我要将索引设置为多索引

toy_df = toy_df.set_index(['a','b'])

...我将如何使用这些索引来返回相同的结果？ 我努力了...

filter_a = toy_df.index.isin(vehicles,level=0)
filter_b = toy_df.index.isin(vehicles,level=1)

filter_a
>>  array([ True,  True,  True, False])

filter_b
>> array([False,  True,  True,  True])

但我不知道如何将过滤器与.loc一起使用，我不确定这是否是最有效的方法。 任何指导将不胜感激。

Answer 1

我使用了您最初的 dataframe，此处为 df，并运行以下代码：

import time

df2 = df.set_index(['a', 'b'])

vehicles = ['van','car','bike']
for test in range(5):
    def myfunc1():
        df[(df.a.isin(vehicles))|(df.b.isin(vehicles))]

    def myfunc2():
        df2[(df2.index.isin(vehicles, level = 0))&(df2.index.isin(vehicles, level = 1))]

    n = 1000
    t0 = time.time()
    for i in range(n): myfunc1()
    t1 = time.time()

    t2 = time.time()
    for i in range(n): myfunc2()
    t3 = time.time()


    total_1 = t1-t0
    total_2 = t3-t2

    print(test, ":", total_1, total_2)

在打印输出中，您可以看到在过滤器上使用过滤器所需的时间不到过滤列所需时间的一半。

0 : 0.8234035968780518 0.37520408630371094
1 : 0.7863156795501709 0.3657698631286621
2 : 0.7700819969177246 0.36788105964660645
3 : 0.7782289981842041 0.4089479446411133
4 : 0.8350069522857666 0.38277411460876465

Pandas Multi-index.isin 所有索引列

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-02-02 01:40:07

Pandas Multi-index.isin 所有索引列

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-02-02 01:40:07

解决方案1
1 已采纳 2021-02-02 01:40:07