In a jupyter notebook, I have a dataframe created from different merged datasets.
record_id | song_id | user_id | number_times_listened
0 |ABC | Shjkn4987 | 3
1 |ABC | Dsfds2347 | 15
2 |ABC | Fkjhh9849 | 7
3 |XYZ | Shjkn4987 | 20
4 |XXX | Shjkn4987 | 5
5 |XXX | Swjdh0980 | 1
I would like to create a pivot table dataframe by song_id listing the number of user_ids and the sum of number_times_listened.
I know that I need to create a for loop with the count and sum functions, but I cannot make it work. I also tried the pandas module's pd.pivot_table.
df = pd.pivot_table(data, index='song_ID', columns='userID', values='number_times_listened', aggfunc='sum')
OR something like this?
total_user=[]
total_times_listened =[]
for x in data:
total_user.append(sum('user_id'))
total_times_listened.append(count('number_times_listened'))
return df('song_id','total_user','total_times_listened')
You can pass a dictionary of column names as keys and a list of functions as values:
funcs = {'number_times_listened':['sum'], 'user_id':['count']}
Then simply use df.groupby
on column song_id
:
df.groupby('song_id').agg(funcs)
The output:
number_times_listened user_id
sum count
song_id
ABC 25 3
XXX 6 2
XYZ 20 1
Not sure if this is related but the column names and casing in your example don't match your Python code.
In any case, the following works for me on Python 2.7:
CSV File:
record_id song_id user_id number_times_listened
0 ABC Shjkn4987 3
1 ABC Dsfds2347 15
2 ABC Fkjhh9849 7
3 XYZ Shjkn4987 20
4 XXX Shjkn4987 5
5 XXX Swjdh0980 1
Python code:
csv_data = pd.read_csv('songs.csv')
df = pd.pivot_table(csv_data, index='song_id', columns='user_id', values='number_times_listened', aggfunc='sum').fillna(0)
The resulting pivot table looks like:
user_id Dsfds2347 Fkjhh9849 Shjkn4987 Swjdh0980
song_id
ABC 15 7 3 0
XXX 0 0 5 1
XYZ 0 0 20 0
Is this what you're looking for? Keep in mind that song_id
, user_id
pairs are unique in your dataset, so the aggregate function isn't actually doing anything in this specific example since there's nothing to group by on these two columns.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.