[英]How to get most common words with a specific value in a dataframe Python
我有一個得分為0和1的數據框以及相應的評論,我想找到0分和1分的評論中最常見的單詞。 我嘗試了這個,但它給出了所有單詞的計數:
count = defaultdict(int)
l = df['Summary']
for number in l:
count[number] += 1
print(count)
如何從1分和0分的所有行中找到最常見的值?
嘗試使用頻率字典。 如果您的列可以被視為列表列表:
data = [[0, "text samle 1"], [0, "text sample 2"], [1, "text sample 3"]]
...那么你也能:
fd0 = dict()
fd1 = dict()
for list_item in data:
associated_value = list_item[0]
#note the split(' ') splits the string into a list of words
for word in list_item[1].split(' '):
if associated_value == 0:
fd0[word] = 1 if word not in fd0 else fd0[word] + 1
elif associated_value == 1:
fd1[word] = 1 if word not in fd1 else fd1[word] + 1
在循環結束時,fd0應具有標簽0的頻率,fd1應具有標簽1的頻率。
假設您的數據看起來像這樣
review score
0 bad review 0
1 good review 1
2 very bad review 0
3 movie was good 1
你可以做點什么
words = pd.concat([pd.Series(row['score'], row['review'].split(' '))
for _, row in df.iterrows()]).reset_index()
words.columns = ['word', 'score']
print(words.groupby(['score', 'word']).size())
給你的
score word
0 bad 2
review 2
very 1
1 good 2
movie 1
review 1
was 1
dtype: int64
most_common_0 = ''
most_common_1 = ''
for text, score in zip(df['Summary'], df['Score']):
if score == 1:
most_common_1 += ' ' + text
else:
most_common_0 += ' ' + text
from collections import Counter
c = Counter(most_common_1.split())
print(c.most_common(2)) # change this 2 to the number you want to analyze
產量
[('good', 2), ('and', 1)]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.