Efficient way to group pandas dataframe rows by a list of tags in a column

Question

Given a dataframe like:

df = pd.DataFrame(
        {
            'Movie':
            [
                'Star Trek',
                'Harry Potter',
                'Bohemian Rhapsody',
                'The Imitation Game',
                'The Avengers'
            ],
            'Genre':
            [
                'sci-fi; fiction',
                'fantasy; fiction; magic',
                'biography; drama; music',
                'biography; drama; thriller',
                'action; adventure; sci-fi'
            ]
        }
)

I'd like to group by the individual tags in the 'Genre' column and collect the movies as lists like:

                                                 0
magic                               [Harry Potter]
sci-fi                   [Star Trek, The Avengers]
fiction                  [Star Trek, Harry Potter]
drama      [Bohemian Rhapsody, The Imitation Game]
fantasy                             [Harry Potter]
music                          [Bohemian Rhapsody]
thriller                      [The Imitation Game]
action                              [The Avengers]
biography  [Bohemian Rhapsody, The Imitation Game]
adventure                           [The Avengers]

My current code works, but I'd like to know if there are more efficient ways to do this. Eg

not needing to convert between list, dataframe and dictionary,
not needing to use a for loop (perhaps something like groupby )

genre = df['Genre'].apply(lambda x: str(x).split("; ")).tolist()
movie = df['Movie'].tolist()
data = dict()
for m,genres in zip(movie, genre):
    for g in genres:
        try:
            g_ = data[g]
        except:
            data[g] = [m]
        else:
            g_.append(m)

for key,value in data.items():
    data[key] = [data[key]]

output = pd.DataFrame.from_dict(data, orient='index')

Answer 1

It's easier when we first split the genres into a list

df['Genre'] = df.Genre.str.split('; ')
df.explode('Genre').groupby('Genre')['Movie'].apply(list)

Output

action                                [The Avengers]
adventure                             [The Avengers]
biography    [Bohemian Rhapsody, The Imitation Game]
drama        [Bohemian Rhapsody, The Imitation Game]
fantasy                               [Harry Potter]
fiction                    [Star Trek, Harry Potter]
magic                                 [Harry Potter]
music                            [Bohemian Rhapsody]
sci-fi                     [Star Trek, The Avengers]
thriller                        [The Imitation Game]

Efficient way to group pandas dataframe rows by a list of tags in a column

Question

1 answers

solution1
2 ACCPTED 2020-09-24 13:37:23

Efficient way to group pandas dataframe rows by a list of tags in a column

Question

1 answers

solution1 2 ACCPTED 2020-09-24 13:37:23

solution1
2 ACCPTED 2020-09-24 13:37:23