Pandas Groupby Select 在一列中具有多个唯一值的组

Question

I have a dataframe of some information about some artists, their albums, and their tracks.我有一些关于一些艺术家、他们的专辑和他们的曲目的信息的 dataframe。

df = pd.DataFrame({'Artist': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C', 'C', 'D', 'E'], 'AlbumId': [201, 201, 451, 390, 390, 272, 272, 698, 698, 235, 312], 'TrackId': [1022, 3472, 9866, 6078, 2634, 3411, 8673, 2543, 5837, 9874, 1089]})

数据框

Artist A has 2 albums(201 and 451), with one having 2 tracks(1022 and 3472) and 1 having 1 track(9866).艺术家 A 有 2 张专辑（201 和 451），其中一张有 2 首曲目（1022 和 3472），一张有 1 首曲目（9866）。

Artist B has 1 album(390) with 2 tracks(6078 and 2634).艺术家 B 有 1 张专辑 (390) 和 2 首曲目 (6078 和 2634)。

Artist C has 2 albums(272 and 698), with each album having 2 tracks.艺术家 C 有 2 张专辑（272 和 698），每张专辑有 2 首曲目。

Artist D has 1 album(235) with 1 track(9874).艺术家 D 有 1 张专辑 (235) 和 1 首曲目 (9874)。

Artist E has 1 album(312) with 1 track(1089).艺术家 E 有 1 张专辑 (312) 和 1 首曲目 (1089)。

I want to find the artists who have more than 1 album, and get the rows of these artists accordingly.我想找到拥有超过 1 张专辑的艺术家，并相应地获取这些艺术家的行。 My desired output looks like this:我想要的 output 看起来像这样：

期望的输出

I have tried:我努力了：

groupedArtists = data.groupby(['ArtistId', 'AlbumId']).filter(lambda group: (group.AlbumId.nunique() > 1))

But it seems not to work as expected.但它似乎没有按预期工作。

Could someone please help me out?有人可以帮我吗？ I appreciate it!我很感激！

Answer 1

You want to group by only ArtistId , and not AlbumId :您只想按ArtistId ，而不是AlbumId ：

groupedArtists = data.groupby(['Artist']).filter(lambda x: x['AlbumId'].nunique() > 1)

Output: Output：

>>> groupedArtists
  Artist  AlbumId  TrackId
0      A      201     1022
1      A      201     3472
2      A      451     9866
5      C      375     1022
6      C      412     9866
7      C      375     3472
...

Answer 2

Grouping should be solely by Artist .分组应该完全由Artist 。

Then, for each group, check how many (different) albums it contains and take only groups having more than 1 album.然后，对于每个组，检查它包含多少（不同）专辑，并只选择拥有超过 1 张专辑的组。

So the proper solution is:所以正确的解决方案是：

data.groupby('Artist').filter(lambda grp: grp.AlbumId.nunique() > 1)

Answer 3

This is the solution I found, which is a little more verbose, but perhaps more easily understandable:这是我找到的解决方案，它有点冗长，但可能更容易理解：

counted = df.groupby(['Artist']).size().reset_index(name='counts')
df[df['Artist'].isin(counted[counted.counts > 1].Artist)]

Answer 4

you can create a aggregated DataFrame based on counts of Albums and then filter on the number of albums you want您可以根据专辑数量创建聚合的 DataFrame，然后过滤您想要的专辑数量

stats = df.groupby(['Artist'])['AlbumId'].count().reset_index()
morethan1 = stats.loc[stats['AlbumId'] >1]

Pandas Groupby Select 在一列中具有多个唯一值的组

问题描述

4 个解决方案

解决方案1
2 2021-12-05 16:28:31

解决方案2
1 2021-12-05 16:30:03

解决方案3
0 2021-12-05 16:40:04

解决方案4
0 2021-12-05 17:09:11

Pandas Groupby Select 在一列中具有多个唯一值的组

问题描述

4 个解决方案

解决方案1 2 2021-12-05 16:28:31

解决方案2 1 2021-12-05 16:30:03

解决方案3 0 2021-12-05 16:40:04

解决方案4 0 2021-12-05 17:09:11

解决方案1
2 2021-12-05 16:28:31

解决方案2
1 2021-12-05 16:30:03

解决方案3
0 2021-12-05 16:40:04

解决方案4
0 2021-12-05 17:09:11