how to filter groupby object in pandas based on difference of values within the group?

Question

I have a dataframe as listed below:

In []: dff = pd.DataFrame({'A': np.arange(8),
                           'B': list('aabbbbcc'),
                           'C':np.random.randint(100,size=8)})

which i have grouped based on column B

  In []: grouped = dff.groupby('B')

Now, I want to filter the dff based on difference of values in column 'C' . For example, if the difference between any two points within the group in column C is greater than a threshold, remove that row.

If dff is:

   A  B   C
0  0  a  18
1  1  a  25
2  2  b  56
3  3  b  62
4  4  b  46
5  5  b  56
6  6  c  74
7  7  c   3

Then, a threshold of 10 for C will produce a final table like:

   A  B   C
0  0  a  18
1  1  a  25
2  2  b  56
3  3  b  62
4  4  b  46
5  5  b  56

here the grouped category c (small letter) is removed as the difference between the two is greater than 10, but category b has all the rows intact as they are all within 10 of each other.

Answer 1

I think I'd do the hard work in numpy:

In [11]: a = np.array([2, 3, 14, 15, 54])

In [12]: res = np.abs(a[:, np.newaxis] - a) < 10  # Note: perhaps you want <= 10.

In [13]: np.fill_diagonal(res, False)

In [14]: res.any(0)
Out[14]: array([ True,  True,  True,  True, False], dtype=bool)

You could wrap this in a function:

In [15]: def has_close(a, n=10):
              res = np.abs(a[:, np.newaxis] - a) < n
              np.fill_diagonal(res, False)
              return res.any(0)

In [16]: g = df.groupby('B', as_index=False)

In [17]: g.C.apply(lambda x: x[has_close(x.C.values)])
Out[17]: 
   A  B   C
0  0  a  18
1  1  a  25
2  2  b  56
3  3  b  62
5  5  b  56

how to filter groupby object in pandas based on difference of values within the group?

Question

1 answers

solution1
0 ACCPTED 2014-03-25 20:52:18

how to filter groupby object in pandas based on difference of values within the group?

Question

1 answers

solution1 0 ACCPTED 2014-03-25 20:52:18

solution1
0 ACCPTED 2014-03-25 20:52:18