如何在数据框Python中获取具有特定值的最常见单词

Question

I have a dataframe with score points 0 and 1 and corresponding reviews, I want to find the most common words in reviews with 0 points and 1 points. 我有一个得分为0和1的数据框以及相应的评论，我想找到0分和1分的评论中最常见的单词。 I tried this but it gives the count of all words: 我尝试了这个，但它给出了所有单词的计数：

count = defaultdict(int)
l = df['Summary']
for number in l:
    count[number] += 1

print(count)

How can I find the most common values from all the rows with 1 score and 0 score? 如何从1分和0分的所有行中找到最常见的值？

Answer 1

Try using a frequency dict. 尝试使用频率字典。 If your columns can be viewed as a list of lists: 如果您的列可以被视为列表列表：

data = [[0, "text samle 1"], [0, "text sample 2"], [1, "text sample 3"]]

...then you can: ...那么你也能：

fd0 = dict()
fd1 = dict()
for list_item in data:
    associated_value = list_item[0]

    #note the split(' ') splits the string into a list of words
    for word in list_item[1].split(' '):
        if associated_value == 0:
            fd0[word] = 1 if word not in fd0 else fd0[word] + 1
        elif associated_value == 1:
            fd1[word] = 1 if word not in fd1 else fd1[word] + 1

At the end of the loop your fd0 should have frequency for label 0 and fd1 should have frequency for label 1. 在循环结束时，fd0应具有标签0的频率，fd1应具有标签1的频率。

Answer 2

Assuming your data looks like this 假设您的数据看起来像这样

            review  score
0       bad review      0
1      good review      1
2  very bad review      0
3   movie was good      1

You could do something like 你可以做点什么

words = pd.concat([pd.Series(row['score'], row['review'].split(' '))              
    for _, row in df.iterrows()]).reset_index()
words.columns = ['word', 'score']
print(words.groupby(['score', 'word']).size())

which gives you 给你的

score  word
0      bad       2
       review    2
       very      1
1      good      2
       movie     1
       review    1
       was       1
dtype: int64

Answer 3

most_common_0 = ''
most_common_1 = ''

for text, score in zip(df['Summary'], df['Score']):
    if score == 1:
        most_common_1 += ' ' + text
    else:
        most_common_0 += ' ' + text

from collections import Counter
c = Counter(most_common_1.split())
print(c.most_common(2)) # change this 2 to the number you want to analyze

Output 产量

[('good', 2), ('and', 1)]

如何在数据框Python中获取具有特定值的最常见单词

问题描述

3 个解决方案

解决方案1
0 2019-05-17 21:54:42

解决方案2
0 2019-05-17 22:05:01

解决方案3
0 2019-05-17 22:12:31

如何在数据框Python中获取具有特定值的最常见单词

问题描述

3 个解决方案

解决方案1 0 2019-05-17 21:54:42

解决方案2 0 2019-05-17 22:05:01

解决方案3 0 2019-05-17 22:12:31

解决方案1
0 2019-05-17 21:54:42

解决方案2
0 2019-05-17 22:05:01

解决方案3
0 2019-05-17 22:12:31