简体   繁体   中英

pandas : get a count of occurences given a list

Let's say I have something like this

user_id,service
------------------
user_1,service1
user_2,service1
user_3,service2
user_1,service2
user_3,service1
user_3,service2

And what I would like to have eventually is this :

user_id, service1, service2
----------------------------
user_1, 1, 1
user_2, 1, 0
user_3, 1, 2

so far, here is my code :

data = pandas.read_csv('dataset.csv')

service_by_user = data['service'].groupby(data['user_id'])

count_occurences_services = service_by_user.apply(pandas.value_counts)

so what I get is this with my code :

user_1   service1    1
         service2    1
user_2   service1    1
         service2    0
user_3   service1    1
         service2    2

But then I don't know how to get to what i want Note : I have far more users and services than this example, and not all users use all the services, in fact most use at most 3 or 4 among all services. I have an array with all the services used, with this :

service_by_user = data.set_index('user_id')
list_services = service_by_user.service.unique()

You can use pivot_table :

data.pivot_table(index=['user_id'], columns=['service'], aggfunc='size', fill_value=0)

service  service1  service2
user_id                    
user_1          1         1
user_2          1         0
user_3          1         2

With some additional formatting:

data.pivot_table(index=['user_id'], columns=['service'], aggfunc='size', fill_value=0) \
    .rename_axis(None, axis=1) \
    .reset_index()

  user_id  service1  service2
0  user_1         1         1
1  user_2         1         0
2  user_3         1         2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM