简体   繁体   English

Groupby 在 pandas 中聚合和转置

[英]Groupby aggregate and transpose in pandas

df= df=

Genre Song          Singer               Playlist           Album
Rock  Evil Walks      AC/DC                 Music            For Those About To Rock We Salute You
Rock  Snowballed      AC/DC                 Music            For Those About To Rock We Salute You
Rock  C.O.D           AC/DC                 Music            For Those About To Rock We Salute You         
Rock  Perfect         Alanis Morissette     Music            Jagged Little Pill
Rock  Forgiven        Alanis Morissette     Music            Jagged Little Pill
Metal Sad But True    Apocalyptica          Music            Plays Metallica By Four Cellos
Metal All For You     Black Label Society   Music            Alcohol Fueled Brewtality Live! [Disc 1]
Blues Layla           Eric Clapton          Music            The Cream Of Clapton
Blues Crossroads      Eric Clapton          Music            The Cream Of Clapton
.......
......
....
Latin Etnia           Chico Science         Music            Afrociberdelia

Off all the genres in the genre field, I only need to consider 'Rock', 'Latin', 'Metal', 'Blues' and build a new dataframe based on the following requirements在流派领域的所有流派中,我只需要考虑“摇滚”、“拉丁”、“金属”、“布鲁斯”,并根据以下要求构建一个新的 dataframe

a.how many songs the singer has from that genre (count of each genre must be in a separate column). a. 歌手有多少歌曲来自该流派(每种流派的计数必须在单独的列中)。

b.Count of how many albums the singer has in the data. b. 歌手在数据中拥有多少张专辑的计数。

c.Count of how many tracks the singer has in the data. c. 歌手在数据中的曲目数。

d.Count of how many playlists that include any Song of the singer. d. 包含歌手任何歌曲的播放列表的计数。

Desired Output:所需的 Output:

Singer       Rock  Latin  Metal  Blues   CountofAlbums   CountofSongs  Count of Playlists
AC/DC         5      7    8      2         4                22             2
Metallica     8      0    22     0         6                30             6       
Iron Maiden   21     0    27     13        10               61             12

I was going to create one df for part a and one for parts b,c,d and merge them.我打算为a部分创建一个df,为b部分创建一个df,c,d并将它们合并。

For parts b,c and d.对于零件 b、c 和 d。 I thought of looping over singer names and using nunique to get distinct count, but did not realize, the loop would also return column headers everytime.我想过循环歌手姓名并使用 nunique 来获得不同的计数,但没有意识到,循环也会每次都返回列标题。

mylist=list(set(df.Singer))
for i in mylist:
    temp=df[df['Singer']==i]
    df2=temp.nunique().to_frame().T
    

For part A, I was going to group songs by genre find a count and do a transpose对于 A 部分,我打算按流派对歌曲进行分组查找计数并进行转置

mylist=list(set(df.Singer))
for i in mylist:
   group=df4.groupby('Genre_Name').agg(count=('Song','count'))
   newdf=group.T

Any help will be greatly appreciated!任何帮助将不胜感激!

Can be done in one line but it's a bit of a mouthful...可以一行完成,但是有点拗口...

df = pd.DataFrame({
    'Genre':['Rock']*5+['Metal']*2+['Blues']*2+['Latin'],
    'Song':['Evil Walks','Snowballed','C.O.D','Perfect','Forgiven','Sad But True',
    'All For You','Layla','Crossroads','Etnia'],
    'Singer':['AC/DC']*3+['Alanis Morissette']*2+['Apocalyptica']+['Black Label Society']+['Eric Clapton']*2+['Chico Science'],
    'Playlist':['Music']*10,
    'Album':['For Those About To Rock We Salute You']*3+['Jagged Little Pill']*2+['Plays Metallica By Four Cellos']+['Alcohol Fueled Brewtality Live! [Disc 1]']+['The Cream Of Clapton']*2+['Afrociberdelia']
    })

agg_df=df.groupby('Singer').agg({'Song':'count'})
agg_df=agg_df.join(df[['Singer','Album']].drop_duplicates().groupby('Singer').count())
agg_df=agg_df.join(df[['Singer','Playlist']].drop_duplicates().groupby('Singer').count())
agg_df=agg_df.join(df.reset_index()[['Singer','Genre','index']].groupby(['Singer','Genre']).count().rename({'index':'count'},axis=1).unstack().fillna(0).astype(np.int16))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM