I'm looking for the syntax to return all first tier data given multiple end value criteria. I've been reading and finding filtering solutions with .loc or .xs but I can quite get the syntax for what I want. I use to work with xpath and I just want //A[ B [ @x=1 and @y=2]]
in essence.
I've tried lots of permutations of syntax I'm familiar with using forms of df.loc df.xs mutlti [], a little with df.index.get_level_values(), etc...
So from a dataframe like this: xy AB ab 1 2 af 4 5 ac 3 4 bd 1 5
bc 1 2 cd 2 3xy AB ab 1 2 af 4 5 ac 3 4 bd 1 5
bc 1 2 cd 2 3
I want to search for a specific combo of x and y and return all rows at the A index level.
So I want x=1 and y=2 and get
xy AB ab 1 2 af 4 5 ac 3 4 bd 1 5 bc 1 2
Because at least 1 single row of a given A matches
And even better more general solution would be to search for an x value of a particular B and y value of a particular different B.
(trying for more clarity): By this I mean, instead of end level values I'm looking for, I may be interested in combination only specific B values. Below I have B 1 = b and x=3. so I'm mixing matching a value with matching an index value. Whereas before I limited two end values. Again, I envision this in xpath like //A[ B [ local-name() == b and @x=3] and B[ local-name() == f and @y=5] ]
(I think I got that right).
For example, B 1 =b: x=3 and B 2 =f: y=5 . Returning:
xy AB a b 1 2 a f 4 5 ac 3 4
Thanks!
You can query
your dataframe via a couple of steps:
A_idx = df.query('x == 1 & y == 2').index.get_level_values('A')
res = df.query('A in @A_idx')
print(res)
# x y
# A B
# a b 1 2
# f 4 5
# c 3 4
# b d 1 5
# c 1 2
Setup
df = pd.DataFrame([['a', 'b', 1, 2], ['a', 'f', 4, 5], ['a', 'c', 3, 4],
['b', 'd', 1, 5], ['b', 'c', 1, 2], ['c', 'd', 2, 3]],
columns=['A', 'B', 'x', 'y'])
df = df.set_index(['A', 'B'])
Using groupby
+ transform
+ any
df[df.eq({'x':1,'y':2}).groupby(level=0).transform('any').any(1)]
x y
A B
a b 1 2
f 4 5
c 3 4
b d 1 5
c 1 2
You can use groupby
on level = 'A' and filter
after creating a flag
column for each x
and y
columns if the values you are looking for are in it with numpy.where
.
#using @jpp setup
import numpy as np
df['flagx'] = np.where(df.x == 1,1,0)
df['flagy'] = np.where(df.y == 5,1,0)
Now, if you want that both x
and y
meet the condition for any value of B
and the same A
, you can use any
on each flag and look for both with &
:
print (df.groupby(level='A').filter(lambda dfg: dfg.flagx.any() & dfg.flagy.any() )
.drop(['flagx','flagy'],axis=1))
x y
A B
a b 1 2
f 4 5
c 3 4
b d 1 5
c 1 2
If you want that both conditions on x
and y
are met on the same row, then you can do it by changing the position of the any
and the &
in the filter
:
print (df.groupby(level='A').filter(lambda dfg: (dfg.flagx & dfg.flagy).any() )
.drop(['flagx','flagy'],axis=1))
x y
A B
b d 1 5
c 1 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.