简体   繁体   中英

Count occurrences of row values in a given set with pandas

I have a dataframe similar to

     a   b   c   d   e
0   36  38  27  12  35
1   45  33   8  41  18
4   32  14   4  14   9
5   43   1  31  11   3
6   16   8   3  17  39
...

and I want, for each row, to count the occurrences of values in a given set.

I came up with the following code (Python 3) which seems to work, but I'm looking for efficiency, since my real dataframe is much more complex and big:

import pandas as pd
import numpy as np

def column():
    return [np.random.randint(0,49) for _ in range(20)]

df = pd.DataFrame({'a': column(),'b': column(),'c': column(),'d': column(),'e': column()})

given_set = {3,8,11,18,22,24,35,36,42,47}

def count_occurrences(row):
    return sum(col in given_set for col in (row.a,row.b,row.c,row.d,row.e))

df['count'] = df.apply(count_occurrences, axis=1)

print(df)

Is there a way to obtain the same result with pandas vectorial operators? (instead of Python function)

Thanks in advance.

IIUC you can use DataFrame.isin() method:

Data:

In [41]: given_set = {3,8,11,18,22,24,35,36,42,47}

In [42]: df
Out[42]:
    a   b   c   d   e
0  36  38  27  12  35
1  45  33   8  41  18
4  32  14   4  14   9
5  43   1  31  11   3
6  16   8   3  17  39

Solution:

In [44]: df['new'] = df.isin(given_set).sum(1)

In [45]: df
Out[45]:
    a   b   c   d   e  new
0  36  38  27  12  35    2
1  45  33   8  41  18    2
4  32  14   4  14   9    0
5  43   1  31  11   3    2
6  16   8   3  17  39    2

Explanation:

In [49]: df.isin(given_set)
Out[49]:
       a      b      c      d      e
0   True  False  False  False   True
1  False  False   True  False   True
4  False  False  False  False  False
5  False  False  False   True   True
6  False   True   True  False  False

In [50]: df.isin(given_set).sum(1)
Out[50]:
0    2
1    2
4    0
5    2
6    2
dtype: int64

UPDATE: if you want check for existence instead of counting, you can do it this way (thanks to @DSM ):

In [6]: df.isin(given_set).any(1)
Out[6]:
0     True
1     True
4    False
5     True
6     True
dtype: bool

In [7]: df.isin(given_set).any(1).astype(np.uint8)
Out[7]:
0    1
1    1
4    0
5    1
6    1
dtype: uint8

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM