![](/img/trans.png)
[英]How to categorize text in a pandas dataframe based on the number of positive and negative keywords
[英]How to go through a dataframe and classify text either positive or negative?
我目前有一個包含標記化鳴叫的pandas數據框。
我需要能夠瀏覽每條推文,並確定它是肯定的還是否定的,以便允許我在隨后的欄中添加包含肯定或否定詞的內容。
示例數據:
tokenized_tweets = ['football, was, good, we, played, well' , 'We, were, unlucky, today, bad, luck' , 'terrible, performance, bad, game']
我需要能夠在tokenized_tweets節中運行一個循環,弄清楚它是正還是負。
對於示例,正詞和負詞如下:
Positive_words = ['good', 'great']
Negative_words = ['terrible, 'bad']
期望的輸出是一個數據消息,其中包含該推文,每個推文包含多少個正字母,每個推文包含多少個負字母以及該推文是正,負還是中性。
需要根據推文具有更多正面還是負面流行語來確定正面負面和中立態度
所需的輸出:
Tokenized tweet positive words negative words overall
`football, was, good, we, played, well 1 0 positive`
We, were, unlucky, today, bad, luck 0 1 negative
terrible, performance, bad, game 0 2 negative
import pandas as pd
import numpy as np
df = pd.DataFrame({'tokenized_tweets': ['football, was, good, we, played, well', 'We, were, unlucky, today, bad, luck','terrible, performance, bad, game']})
Positive_words = ['good', 'great']
Negative_words = ['terrible','bad']
df['positive words'] = df['tokenized_tweets'].str.count('|'.join(Positive_words))
df['negative words'] = df['tokenized_tweets'].str.count('|'.join(Negative_words))
conditions = [
(df['positive words'] > df['negative words']),
(df['negative words'] > df['positive words']),
(df['negative words'] == df['positive words'])
]
choices = [
'positive',
'negative',
'neutral'
]
df['overall'] = np.select(conditions, choices, default = '')
df
出:
tokenized_tweets positive words negative words overall
0 football, was, good, we, played, well 1 0 positive
1 We, were, unlucky, today, bad, luck 0 1 negative
2 terrible, performance, bad, game 0 2 negative
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.