[英]Groupby aggregate and transpose in pandas
df= df=
Genre Song Singer Playlist Album
Rock Evil Walks AC/DC Music For Those About To Rock We Salute You
Rock Snowballed AC/DC Music For Those About To Rock We Salute You
Rock C.O.D AC/DC Music For Those About To Rock We Salute You
Rock Perfect Alanis Morissette Music Jagged Little Pill
Rock Forgiven Alanis Morissette Music Jagged Little Pill
Metal Sad But True Apocalyptica Music Plays Metallica By Four Cellos
Metal All For You Black Label Society Music Alcohol Fueled Brewtality Live! [Disc 1]
Blues Layla Eric Clapton Music The Cream Of Clapton
Blues Crossroads Eric Clapton Music The Cream Of Clapton
.......
......
....
Latin Etnia Chico Science Music Afrociberdelia
Off all the genres in the genre field, I only need to consider 'Rock', 'Latin', 'Metal', 'Blues' and build a new dataframe based on the following requirements在流派领域的所有流派中,我只需要考虑“摇滚”、“拉丁”、“金属”、“布鲁斯”,并根据以下要求构建一个新的 dataframe
a.how many songs the singer has from that genre (count of each genre must be in a separate column). a. 歌手有多少歌曲来自该流派(每种流派的计数必须在单独的列中)。
b.Count of how many albums the singer has in the data. b. 歌手在数据中拥有多少张专辑的计数。
c.Count of how many tracks the singer has in the data. c. 歌手在数据中的曲目数。
d.Count of how many playlists that include any Song of the singer. d. 包含歌手任何歌曲的播放列表的计数。
Desired Output:所需的 Output:
Singer Rock Latin Metal Blues CountofAlbums CountofSongs Count of Playlists
AC/DC 5 7 8 2 4 22 2
Metallica 8 0 22 0 6 30 6
Iron Maiden 21 0 27 13 10 61 12
I was going to create one df for part a and one for parts b,c,d and merge them.我打算为a部分创建一个df,为b部分创建一个df,c,d并将它们合并。
For parts b,c and d.对于零件 b、c 和 d。 I thought of looping over singer names and using nunique to get distinct count, but did not realize, the loop would also return column headers everytime.
我想过循环歌手姓名并使用 nunique 来获得不同的计数,但没有意识到,循环也会每次都返回列标题。
mylist=list(set(df.Singer))
for i in mylist:
temp=df[df['Singer']==i]
df2=temp.nunique().to_frame().T
For part A, I was going to group songs by genre find a count and do a transpose对于 A 部分,我打算按流派对歌曲进行分组查找计数并进行转置
mylist=list(set(df.Singer))
for i in mylist:
group=df4.groupby('Genre_Name').agg(count=('Song','count'))
newdf=group.T
Any help will be greatly appreciated!任何帮助将不胜感激!
Can be done in one line but it's a bit of a mouthful...可以一行完成,但是有点拗口...
df = pd.DataFrame({
'Genre':['Rock']*5+['Metal']*2+['Blues']*2+['Latin'],
'Song':['Evil Walks','Snowballed','C.O.D','Perfect','Forgiven','Sad But True',
'All For You','Layla','Crossroads','Etnia'],
'Singer':['AC/DC']*3+['Alanis Morissette']*2+['Apocalyptica']+['Black Label Society']+['Eric Clapton']*2+['Chico Science'],
'Playlist':['Music']*10,
'Album':['For Those About To Rock We Salute You']*3+['Jagged Little Pill']*2+['Plays Metallica By Four Cellos']+['Alcohol Fueled Brewtality Live! [Disc 1]']+['The Cream Of Clapton']*2+['Afrociberdelia']
})
agg_df=df.groupby('Singer').agg({'Song':'count'})
agg_df=agg_df.join(df[['Singer','Album']].drop_duplicates().groupby('Singer').count())
agg_df=agg_df.join(df[['Singer','Playlist']].drop_duplicates().groupby('Singer').count())
agg_df=agg_df.join(df.reset_index()[['Singer','Genre','index']].groupby(['Singer','Genre']).count().rename({'index':'count'},axis=1).unstack().fillna(0).astype(np.int16))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.