I have a pandas DataFrame like the following:
df = pandas.DataFrame({'A' : ['foo', 'foo', 'foo', 'foo', 'bar', 'bar', \
'bar', 'bar', 'baz', 'baz', 'baz', 'baz'],\
'B' : ['one', 'one', 'two', 'two', 'one', 'one', \
'two', 'two', 'one', 'one', 'two', 'two'],\
'C' : pandas.np.random.randn(12)})
df
A B C
0 foo one -0.241101
1 foo one -0.658436
2 foo two 0.300752
3 foo two -0.589445
4 bar one 1.775511
5 bar one 0.068603
6 bar two -0.464550
7 bar two -0.621055
8 baz one -1.469311
9 baz one 0.490963
10 baz two -0.606491
11 baz two -0.006323
What I want to do is to filter those values in C which are smaller than the mean of the group (A,B).
The grouping works:
groups = df.groupby([df.A, df.B])
upper_bound = groups.C.mean()
upper_bound
A B
bar one 0.922057
two -0.542803
baz one -0.489174
two -0.306407
foo one -0.449768
two -0.144346
Name: C, dtype: float64
But how do I filter now so that (in this example) row 1 foo one -0.658436
would be removed
I tried the following things:
df_ = df.loc[df.C <= upper_bound.loc[df.A, df.B]]
But that says
'None of [0 foo\n1 foo\n2 foo\n3 foo\n4 bar\n5 bar\n6 bar\n7 bar\n8 baz\n9 baz\n10 baz\n11 baz\nName: A, dtype: object] are in the [index]'
And I tried:
df_ = df.loc[df.C <= upper_bound[df.A, df.B]]
and that gives me:
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3492)()
TypeError:
The reason I am trying it this way is because I already (at least I think that) managed to do the 'same' thing but with one-level groups:
groups = df.groupby([df.A])
upper_bound = groups.C.mean()
df_ = df.loc[df.C <= upper_bound.loc[df.A]
And that actually gets rid of everyhing in df where C is below upper_bound.
Any ideas in what I am doing wrong?
You compared the result of groupby upper_bound
to df['C']
, but they have different number of elements. Use transform
to have the mean for each line existing witin each group and compare it to df['C']
. Apply this mask with loc
:
import numpy as np
df.loc[df['C']>=df.groupby(['A','B']).transform(np.mean)['C'],]
Out[13]:
A B C
0 foo one 0.579987
3 foo two 1.701136
5 bar one 1.955158
7 bar two 0.943862
9 baz one -0.628506
10 baz two 1.097203
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.