简体   繁体   中英

Python/Pandas: Pivot table

In a jupyter notebook, I have a dataframe created from different merged datasets.

record_id | song_id | user_id   | number_times_listened

0          |ABC     | Shjkn4987 |          3
1          |ABC     | Dsfds2347 |          15
2          |ABC     | Fkjhh9849 |          7
3          |XYZ     | Shjkn4987 |          20
4          |XXX     | Shjkn4987 |          5
5          |XXX     | Swjdh0980 |          1

I would like to create a pivot table dataframe by song_id listing the number of user_ids and the sum of number_times_listened.

I know that I need to create a for loop with the count and sum functions, but I cannot make it work. I also tried the pandas module's pd.pivot_table.

df = pd.pivot_table(data, index='song_ID', columns='userID', values='number_times_listened', aggfunc='sum')

OR something like this?

total_user=[]
total_times_listened =[]
for x in data: 
    total_user.append(sum('user_id'))
    total_times_listened.append(count('number_times_listened'))
return df('song_id','total_user','total_times_listened')

You can pass a dictionary of column names as keys and a list of functions as values:

funcs = {'number_times_listened':['sum'], 'user_id':['count']}

Then simply use df.groupby on column song_id :

df.groupby('song_id').agg(funcs)

The output:

number_times_listened   user_id
      sum   count
song_id         
ABC     25  3
XXX     6   2
XYZ     20  1

Not sure if this is related but the column names and casing in your example don't match your Python code.

In any case, the following works for me on Python 2.7:

CSV File:

record_id   song_id user_id number_times_listened
0   ABC Shjkn4987   3
1   ABC Dsfds2347   15
2   ABC Fkjhh9849   7
3   XYZ Shjkn4987   20
4   XXX Shjkn4987   5
5   XXX Swjdh0980   1

Python code:

csv_data = pd.read_csv('songs.csv')

df = pd.pivot_table(csv_data, index='song_id', columns='user_id', values='number_times_listened', aggfunc='sum').fillna(0)

The resulting pivot table looks like:

user_id  Dsfds2347  Fkjhh9849  Shjkn4987  Swjdh0980
song_id
ABC             15          7          3          0
XXX              0          0          5          1
XYZ              0          0         20          0

Is this what you're looking for? Keep in mind that song_id , user_id pairs are unique in your dataset, so the aggregate function isn't actually doing anything in this specific example since there's nothing to group by on these two columns.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM