简体   繁体   English

如何根据列中的列表元素对数据框进行分组

[英]How to groupby a dataframe based on list elements in a columns

I have a dataframe like this:我有一个这样的数据框:

   movie_id genres
0         2  [1,2]
1         3  [1,3]
2         4  [2,4]

I want to make groups of movies (with duplication) according to genre types.我想根据流派类型制作电影组(有重复)。 Like this:像这样:

   genre_group movie_id genres
0            1        2  [1,2]
1                     3  [1,3]
0            2        2  [1,2]
2                     4  [2,4]
1            3        3  [1,3]
2            4        4  [2,4]

IIUC, you can use explode and map . IIUC,您可以使用explodemap

df1 = df.explode('genres').sort_values('genres').rename(
                    columns={'genres' : 'genres_group'})\
                   .set_index('genres_group',append=True)
 

df1['genres'] = df1.index.get_level_values(0).map(df['genres'])   

print(df1)

                movie_id  genres
  genres_group                  
0 1                    2  [1, 2]
1 1                    3  [1, 3]
0 2                    2  [1, 2]
2 2                    4  [2, 4]
1 3                    3  [1, 3]
2 4                    4  [2, 4]

Is that result you need to achieve?这是你需要达到的结果吗?

genre_group = pd.Series(df.apply(lambda x: pd.Series(x['genres']), axis=1).stack().reset_index(level=1, drop=True), name='genre_group')
df = pd.concat([genre_group, df], axis = 1)

Output:输出:

输出

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM