[英]Efficient way to group pandas dataframe rows by a list of tags in a column
Given a dataframe like:给定一个数据框,如:
df = pd.DataFrame(
{
'Movie':
[
'Star Trek',
'Harry Potter',
'Bohemian Rhapsody',
'The Imitation Game',
'The Avengers'
],
'Genre':
[
'sci-fi; fiction',
'fantasy; fiction; magic',
'biography; drama; music',
'biography; drama; thriller',
'action; adventure; sci-fi'
]
}
)
I'd like to group by the individual tags in the 'Genre' column and collect the movies as lists like:我想按“流派”列中的各个标签进行分组,并将电影收集为如下列表:
0
magic [Harry Potter]
sci-fi [Star Trek, The Avengers]
fiction [Star Trek, Harry Potter]
drama [Bohemian Rhapsody, The Imitation Game]
fantasy [Harry Potter]
music [Bohemian Rhapsody]
thriller [The Imitation Game]
action [The Avengers]
biography [Bohemian Rhapsody, The Imitation Game]
adventure [The Avengers]
My current code works, but I'd like to know if there are more efficient ways to do this.我当前的代码有效,但我想知道是否有更有效的方法来做到这一点。 Eg
例如
groupby
)groupby
)genre = df['Genre'].apply(lambda x: str(x).split("; ")).tolist()
movie = df['Movie'].tolist()
data = dict()
for m,genres in zip(movie, genre):
for g in genres:
try:
g_ = data[g]
except:
data[g] = [m]
else:
g_.append(m)
for key,value in data.items():
data[key] = [data[key]]
output = pd.DataFrame.from_dict(data, orient='index')
It's easier when we first split the genres into a list当我们首先将流派分成列表时会更容易
df['Genre'] = df.Genre.str.split('; ')
df.explode('Genre').groupby('Genre')['Movie'].apply(list)
Output输出
action [The Avengers]
adventure [The Avengers]
biography [Bohemian Rhapsody, The Imitation Game]
drama [Bohemian Rhapsody, The Imitation Game]
fantasy [Harry Potter]
fiction [Star Trek, Harry Potter]
magic [Harry Potter]
music [Bohemian Rhapsody]
sci-fi [Star Trek, The Avengers]
thriller [The Imitation Game]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.