简体   繁体   中英

Dataframe filter with multi-index: return all rows at top index level given value filters

I'm looking for the syntax to return all first tier data given multiple end value criteria. I've been reading and finding filtering solutions with .loc or .xs but I can quite get the syntax for what I want. I use to work with xpath and I just want //A[ B [ @x=1 and @y=2]] in essence.

I've tried lots of permutations of syntax I'm familiar with using forms of df.loc df.xs mutlti [], a little with df.index.get_level_values(), etc...

So from a dataframe like this: xy AB ab 1 2 af 4 5 ac 3 4 bd 1 5
bc 1 2 cd 2 3
xy AB ab 1 2 af 4 5 ac 3 4 bd 1 5
bc 1 2 cd 2 3

I want to search for a specific combo of x and y and return all rows at the A index level.

So I want x=1 and y=2 and get

xy AB ab 1 2 af 4 5 ac 3 4 bd 1 5 bc 1 2

Because at least 1 single row of a given A matches

And even better more general solution would be to search for an x value of a particular B and y value of a particular different B.

(trying for more clarity): By this I mean, instead of end level values I'm looking for, I may be interested in combination only specific B values. Below I have B 1 = b and x=3. so I'm mixing matching a value with matching an index value. Whereas before I limited two end values. Again, I envision this in xpath like //A[ B [ local-name() == b and @x=3] and B[ local-name() == f and @y=5] ] (I think I got that right).

For example, B 1 =b: x=3 and B 2 =f: y=5 . Returning:

xy AB a b 1 2 a f 4 5 ac 3 4

Thanks!

You can query your dataframe via a couple of steps:

A_idx = df.query('x == 1 & y == 2').index.get_level_values('A')
res = df.query('A in @A_idx')

print(res)

#      x  y
# A B      
# a b  1  2
#   f  4  5
#   c  3  4
# b d  1  5
#   c  1  2

Setup

df = pd.DataFrame([['a', 'b', 1, 2], ['a', 'f', 4, 5], ['a', 'c', 3, 4],
                   ['b', 'd', 1, 5], ['b', 'c', 1, 2], ['c', 'd', 2, 3]],
                  columns=['A', 'B', 'x', 'y'])

df = df.set_index(['A', 'B'])

Using groupby + transform + any

df[df.eq({'x':1,'y':2}).groupby(level=0).transform('any').any(1)]
     x  y
A B      
a b  1  2
  f  4  5
  c  3  4
b d  1  5
  c  1  2

You can use groupby on level = 'A' and filter after creating a flag column for each x and y columns if the values you are looking for are in it with numpy.where .

#using @jpp setup
import numpy as np
df['flagx'] = np.where(df.x == 1,1,0)
df['flagy'] = np.where(df.y == 5,1,0)

Now, if you want that both x and y meet the condition for any value of B and the same A , you can use any on each flag and look for both with & :

print (df.groupby(level='A').filter(lambda dfg: dfg.flagx.any() & dfg.flagy.any() )
         .drop(['flagx','flagy'],axis=1))
     x  y
A B      
a b  1  2
  f  4  5
  c  3  4
b d  1  5
  c  1  2

If you want that both conditions on x and y are met on the same row, then you can do it by changing the position of the any and the & in the filter :

print (df.groupby(level='A').filter(lambda dfg: (dfg.flagx & dfg.flagy).any() )
         .drop(['flagx','flagy'],axis=1))
     x  y
A B      
b d  1  5
  c  1  2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM