简体   繁体   中英

How to count the number of occurrences on comma delimited column in Python Pandas

How can to count the number of occurrences of comma-separated values from the whole list of columns

data frame is like this:

id column
1   
2   1
3   1
4   1,2
5   1,2
6   1,2,4
7   1,2,4
8   1,2,4,6
9   1,2,4,6
10  1,2,4,6,8
11  1,2,4,6,8

Desired output is:

id column count
1.         10
2   1.     7    
3   1.     0
4   1,2.   6
5   1,2.   0
6   1,2,4. 4  
7   1,2,4.  0
8   1,2,4,6. 2
9   1,2,4,6. 0
10  1,2,4,6,8 0
11  1,2,4,6,8 0

Tried this:

df = pd.read_csv('parentsplit/parentlist.csv')
df['count'] = df['parent_list'].str.split(',', expand=True).stack().value_counts()


its not working.

You can do as follows:

df['count'] = df['id'].apply(lambda x: df['column'].fillna('X').str.contains(str(x)).sum())

This is basically counting the number of occurence of each id in the column.

Output:

    id     column  count
0    1       None   10
1    2          1    8
2    3          1    0
3    4        1,2    6
4    5        1,2    0
5    6      1,2,4    4
6    7      1,2,4    0
7    8    1,2,4,6    2
8    9    1,2,4,6    0
9   10  1,2,4,6,8    0
10  11  1,2,4,6,8    0

Split and explode the column, then count the occurrences using value_counts then map the counts onto id column

s = df['column'].str.split(',').explode().value_counts()
df['count'] = df['id'].astype(str).map(s).fillna(0)

    id     column  count
0    1       None   10.0
1    2          1    8.0
2    3          1    0.0
3    4        1,2    6.0
4    5        1,2    0.0
5    6      1,2,4    4.0
6    7      1,2,4    0.0
7    8    1,2,4,6    2.0
8    9    1,2,4,6    0.0
9   10  1,2,4,6,8    0.0
10  11  1,2,4,6,8    0.0

A fast method would be to not use pandas methods, but pure python: itertools.chain and collections.Counter :

from itertools import chain
from collections import Counter
c = Counter(chain(*df['column'].str.split(',').values))
df['count'] = df['id'].astype(str).map(c)

output:

    id     column  count
0    1                10
1    2          1      8
2    3          1      0
3    4        1,2      6
4    5        1,2      0
5    6      1,2,4      4
6    7      1,2,4      0
7    8    1,2,4,6      2
8    9    1,2,4,6      0
9   10  1,2,4,6,8      0
10  11  1,2,4,6,8      0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM