简体   繁体   English

Pandas 在一个索引级别上的 MultiIndex 匹配

[英]Pandas MultiIndex match on one index level

I have a pandas MultiIndex object where the first level is a regular increasing index of ints, and the second level contains other integers that may or may not repeat for different 'frst' index values:我有一个 pandas MultiIndex object,其中第一级是整数的常规递增索引,第二级包含其他整数,这些整数可能会或可能不会重复不同的“第一”索引值:

lst = list(filter(lambda x: x[1]%5 == x[0] or x[1]%4 == x[0],[(i,j) for i in range(5) for j in range(0, 20, 2)]))
mi = pd.MultiIndex.from_tuples(lst).rename(['frst', 'scnd'])
# mi = MultiIndex([(0,  0),(0,  4),(0,  8),(0, 10),(0, 12),(0, 16),(1,  6),(1, 16),(2,  2),(2,  6),(2, 10),(2, 12),(2, 14),(2, 18),(3,  8),(3, 18),(4,  4),(4, 14)], names=['frst', 'scnd'])

For a given frst value (eg frst_idx = 0 ) and some shift , I need to find all indices where frst is frst_idx+shift , and scnd is shared between frst_idx and frst_idx+shift .对于给定的第一个值(例如frst frst_idx = 0 )和一些shift ,我需要找到所有索引,其中frstfrst_idx+shift ,并且scndfrst_idxfrst_idx+shift之间共享。

So for example:例如:

  • frst_idx = 0 , shift = 3 should output [8] because the MultiIndex above contains both (0, 8) and (3, 8) . frst_idx = 0 , shift = 3应该是 output [8]因为上面的 MultiIndex 包含(0, 8)(3, 8)
  • frst_idx = 1 , shift = 1 should output [6] because (1, 6) and (2, 6) are both in the index frst_idx = 1 , shift = 1应该是 output [6]因为(1, 6)(2, 6)都在索引中

So I'm hoping for a function that can take these args and return a pd.Series of all matching scnd values:所以我希望 function 可以接受这些参数并返回一个 pd.Series 所有匹配的scnd值:

my_func(multi_index=mi, frst_idx=0, shift=3) ==> pd.Series([8])

Doing this iteratively is very expensive ( O(n^2) ), I'm hoping there's some pandas magic to do this faster.迭代地执行此操作非常昂贵 ( O(n^2) ),我希望有一些 pandas 魔法可以更快地执行此操作。

I found the following solution:我找到了以下解决方案:

# reminder: $mi is a MultiIndex, mi.names = ['frst', 'scnd']
# assume some integer values for $frst_idx1, $shift

scnd_indices1 = mi[mi.get_level_values('frst') == frst_idx1].drop_level('frst')

frst_idx2 = frst_idx1 + shift
scnd_indices2 = mi[mi.get_level_values('frst') == frst_idx2].drop_level('frst')

result = scnd_indices1.intersection(scnd_indices2).to_series().reset_index(drop=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM