通过列中的标签列表对熊猫数据框行进行分组的有效方法

Question

Given a dataframe like:给定一个数据框，如：

df = pd.DataFrame(
        {
            'Movie':
            [
                'Star Trek',
                'Harry Potter',
                'Bohemian Rhapsody',
                'The Imitation Game',
                'The Avengers'
            ],
            'Genre':
            [
                'sci-fi; fiction',
                'fantasy; fiction; magic',
                'biography; drama; music',
                'biography; drama; thriller',
                'action; adventure; sci-fi'
            ]
        }
)

I'd like to group by the individual tags in the 'Genre' column and collect the movies as lists like:我想按“流派”列中的各个标签进行分组，并将电影收集为如下列表：

                                                 0
magic                               [Harry Potter]
sci-fi                   [Star Trek, The Avengers]
fiction                  [Star Trek, Harry Potter]
drama      [Bohemian Rhapsody, The Imitation Game]
fantasy                             [Harry Potter]
music                          [Bohemian Rhapsody]
thriller                      [The Imitation Game]
action                              [The Avengers]
biography  [Bohemian Rhapsody, The Imitation Game]
adventure                           [The Avengers]

My current code works, but I'd like to know if there are more efficient ways to do this.我当前的代码有效，但我想知道是否有更有效的方法来做到这一点。 Eg例如

not needing to convert between list, dataframe and dictionary,不需要在列表、数据框和字典之间进行转换，
not needing to use a for loop (perhaps something like groupby )不需要使用 for 循环（可能类似于groupby ）

genre = df['Genre'].apply(lambda x: str(x).split("; ")).tolist()
movie = df['Movie'].tolist()
data = dict()
for m,genres in zip(movie, genre):
    for g in genres:
        try:
            g_ = data[g]
        except:
            data[g] = [m]
        else:
            g_.append(m)

for key,value in data.items():
    data[key] = [data[key]]

output = pd.DataFrame.from_dict(data, orient='index')

Answer 1

It's easier when we first split the genres into a list当我们首先将流派分成列表时会更容易

df['Genre'] = df.Genre.str.split('; ')
df.explode('Genre').groupby('Genre')['Movie'].apply(list)

Output输出

action                                [The Avengers]
adventure                             [The Avengers]
biography    [Bohemian Rhapsody, The Imitation Game]
drama        [Bohemian Rhapsody, The Imitation Game]
fantasy                               [Harry Potter]
fiction                    [Star Trek, Harry Potter]
magic                                 [Harry Potter]
music                            [Bohemian Rhapsody]
sci-fi                     [Star Trek, The Avengers]
thriller                        [The Imitation Game]

通过列中的标签列表对熊猫数据框行进行分组的有效方法

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-09-24 13:37:23

通过列中的标签列表对熊猫数据框行进行分组的有效方法

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-09-24 13:37:23

解决方案1
2 已采纳 2020-09-24 13:37:23