简体   繁体   English

通过按列和单热编码列列表(Python、pandas)对平均值进行分组来创建表

[英]Create table by grouping mean values by column and list of one-hot encoded columns (Python, pandas)

I am working with tweets and I would like to report the mean sentiment score by topic and by community .我正在处理推文,我想按主题社区报告平均情绪得分

This is what my dataframe looks like where each row is a document (tweet):这就是我的 dataframe 的样子,其中每一行都是一个文档(推文):

tweet_text        sentiment  community_id   topic_1   topic_2   topic_3    ...    topic_k
"blah blah blah"      0.7      1233             1       0         0        ...       1
"blah blah blah"     -0.4      9845             0       1         1        ...       0
"blah blah blah"      0.1      1233             1       0         1        ...       0

I want to create a dataframe that contains a mean sentiment value in each cell like this:我想创建一个 dataframe ,其中包含每个单元格中的平均情绪值,如下所示:

community_id   topic 1   topic 2   topic 3   ...    topic k
 1233           0.1       -0.8       0.5     ...       0.9
 9845          -0.3        0.2       0.4     ...       0.1
 ...            ...        ...       ...     ...       ...

Any thoughts on how to go about this please?请对 go 有什么想法吗? Thanks!谢谢!

IIUC, first you want to propagate the sentiment through the topic, then average out by community_id : IIUC,首先您要通过主题传播情绪,然后按community_id平均:

(df.filter(like='topic')
   .mul(df.sentiment, axis=0)
   .groupby(df.community_id)
   .mean()
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM