Pandas 数据框分组依据，带有列表的列

Question

Im using jupyter notebooks, my current dataframe looks like the following:我使用 jupyter 笔记本，我当前的数据框如下所示：

players_mentioned  |  tweet_text    |  polarity
______________________________________________
[Mane, Salah]      |  xyz           |    0.12
[Salah]            |  asd           |    0.06

How can I group all players individually and average their polarity?如何将所有玩家单独分组并平均他们的极性？

Currently I have tried to use:目前我尝试使用：

df.groupby(df['players_mentioned'].map(tuple))['polarity'].mean()

But this will return a dataframe grouping all the mentions when together as well as separate, how best can I go about splitting the players up and then grouping them back together.但这将返回一个数据框，将所有提及的内容分组在一起以及分开时，我如何最好地将玩家分开，然后将它们重新组合在一起。

An expected output would contain预期输出将包含

 player  | polarity_average
____________________________
  Mane   |   0.12
  Salah  |   0.09

In other words how to group by each item in the lists in every row.换句话说，如何按每行列表中的每个项目进行分组。

Answer 1

如果您只是想按players_提到的分组并获得该球员受欢迎度得分的平均值，则应该这样做。

df.groupby('players_mentioned').polarity.agg('mean')

Answer 2

you can use the unnesting idiom from this answer .您可以使用此答案中的unnesting习语。

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx

    return df1.join(df.drop(explode, 1), how='left')

You can now call groupby on the unnested "players_mentioned" column.您现在可以在未嵌套的“players_提到”列上调用groupby 。

(unnesting(df, ['players_mentioned'])
    .groupby('players_mentioned', as_index=False).mean())

  players_mentioned  polarity
0              Mane      0.12
1             Salah      0.09

Pandas 数据框分组依据，带有列表的列

问题描述

2 个解决方案

解决方案1
0 2019-04-01 20:13:05

解决方案2
0 已采纳 2019-04-01 20:14:06

Pandas 数据框分组依据，带有列表的列

问题描述

2 个解决方案

解决方案1 0 2019-04-01 20:13:05

解决方案2 0 已采纳 2019-04-01 20:14:06

解决方案1
0 2019-04-01 20:13:05

解决方案2
0 已采纳 2019-04-01 20:14:06