[英]Pandas Groupby Select Groups that Have More Than One Unique Values in a Column
I have a dataframe of some information about some artists, their albums, and their tracks.我有一些关于一些艺术家、他们的专辑和他们的曲目的信息的 dataframe。
df = pd.DataFrame({'Artist': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C', 'C', 'D', 'E'], 'AlbumId': [201, 201, 451, 390, 390, 272, 272, 698, 698, 235, 312], 'TrackId': [1022, 3472, 9866, 6078, 2634, 3411, 8673, 2543, 5837, 9874, 1089]})
Artist A has 2 albums(201 and 451), with one having 2 tracks(1022 and 3472) and 1 having 1 track(9866).艺术家 A 有 2 张专辑(201 和 451),其中一张有 2 首曲目(1022 和 3472),一张有 1 首曲目(9866)。
Artist B has 1 album(390) with 2 tracks(6078 and 2634).艺术家 B 有 1 张专辑 (390) 和 2 首曲目 (6078 和 2634)。
Artist C has 2 albums(272 and 698), with each album having 2 tracks.艺术家 C 有 2 张专辑(272 和 698),每张专辑有 2 首曲目。
Artist D has 1 album(235) with 1 track(9874).艺术家 D 有 1 张专辑 (235) 和 1 首曲目 (9874)。
Artist E has 1 album(312) with 1 track(1089).艺术家 E 有 1 张专辑 (312) 和 1 首曲目 (1089)。
I want to find the artists who have more than 1 album, and get the rows of these artists accordingly.我想找到拥有超过 1 张专辑的艺术家,并相应地获取这些艺术家的行。 My desired output looks like this:我想要的 output 看起来像这样:
I have tried:我努力了:
groupedArtists = data.groupby(['ArtistId', 'AlbumId']).filter(lambda group: (group.AlbumId.nunique() > 1))
But it seems not to work as expected.但它似乎没有按预期工作。
Could someone please help me out?有人可以帮我吗? I appreciate it!我很感激!
You want to group by only ArtistId
, and not AlbumId
:您只想按ArtistId
,而不是AlbumId
:
groupedArtists = data.groupby(['Artist']).filter(lambda x: x['AlbumId'].nunique() > 1)
Output: Output:
>>> groupedArtists
Artist AlbumId TrackId
0 A 201 1022
1 A 201 3472
2 A 451 9866
5 C 375 1022
6 C 412 9866
7 C 375 3472
...
Grouping should be solely by Artist .分组应该完全由Artist 。
Then, for each group, check how many (different) albums it contains and take only groups having more than 1 album.然后,对于每个组,检查它包含多少(不同)专辑,并只选择拥有超过 1 张专辑的组。
So the proper solution is:所以正确的解决方案是:
data.groupby('Artist').filter(lambda grp: grp.AlbumId.nunique() > 1)
This is the solution I found, which is a little more verbose, but perhaps more easily understandable:这是我找到的解决方案,它有点冗长,但可能更容易理解:
counted = df.groupby(['Artist']).size().reset_index(name='counts')
df[df['Artist'].isin(counted[counted.counts > 1].Artist)]
you can create a aggregated DataFrame based on counts of Albums and then filter on the number of albums you want您可以根据专辑数量创建聚合的 DataFrame,然后过滤您想要的专辑数量
stats = df.groupby(['Artist'])['AlbumId'].count().reset_index()
morethan1 = stats.loc[stats['AlbumId'] >1]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.