[英]Create table by grouping mean values by column and list of one-hot encoded columns (Python, pandas)
I am working with tweets and I would like to report the mean sentiment score by topic and by community .我正在处理推文,我想按主题和社区报告平均情绪得分。
This is what my dataframe looks like where each row is a document (tweet):这就是我的 dataframe 的样子,其中每一行都是一个文档(推文):
tweet_text sentiment community_id topic_1 topic_2 topic_3 ... topic_k
"blah blah blah" 0.7 1233 1 0 0 ... 1
"blah blah blah" -0.4 9845 0 1 1 ... 0
"blah blah blah" 0.1 1233 1 0 1 ... 0
I want to create a dataframe that contains a mean sentiment value in each cell like this:我想创建一个 dataframe ,其中包含每个单元格中的平均情绪值,如下所示:
community_id topic 1 topic 2 topic 3 ... topic k
1233 0.1 -0.8 0.5 ... 0.9
9845 -0.3 0.2 0.4 ... 0.1
... ... ... ... ... ...
Any thoughts on how to go about this please?请对 go 有什么想法吗? Thanks!
谢谢!
IIUC, first you want to propagate the sentiment through the topic, then average out by community_id
: IIUC,首先您要通过主题传播情绪,然后按
community_id
平均:
(df.filter(like='topic')
.mul(df.sentiment, axis=0)
.groupby(df.community_id)
.mean()
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.