简体   繁体   中英

Labelling for analysis sentiment with file

I have a data called:

What I want to is labelling sentiment positive and negative for data from after_tokenize.xlsx. If data on after tokenize have a lot of positive word from data positive.xlsx it will be positive and If data have a lot negative word from negative it will be negative. the result will be entered into a label named label. sample:

data label
[i, like, love, hate, you] positive
[i, worst, hate, like, you] negative
import pandas as pd
import nltk

df = pd.DataFrame({'data': ['i like love hate you', 'i dont hate like you']})
pos = pd.DataFrame(data=['like', 'love'], columns=['positive'])
neg = pd.DataFrame(data=['dont', 'hate'], columns=['negative'])
df['data'] = df.apply(lambda row: nltk.word_tokenize(row['data']), axis=1)

You can use set() and opertion set(...) & set(...) to get words which are in to lists.
And then you can count them using len()

len( set([i, like, love, hate, you]) & set(['like', 'love']) ) 

import pandas as pd
import nltk

df = pd.DataFrame({'data': ['i like love hate you', 'i dont hate like you']})

pos = ['like', 'love']
neg = ['dont', 'hate']

#print(df)

df['data'] = df['data'].apply(nltk.word_tokenize)

df['pos words'] = df['data'].apply(lambda item: list(set(item) & set(pos)))
df['neg words'] = df['data'].apply(lambda item: list(set(item) & set(neg)))

df['pos'] = df['data'].apply(lambda item: len(set(item) & set(pos)))
df['neg'] = df['data'].apply(lambda item: len(set(item) & set(neg)))

df['label'] = '???'
#df.['label'][ df['pos'] > df['neg'] ] = 'positive'
df.loc[ (df['pos'] > df['neg']), 'label' ] = 'positive'
#df.['label'][ df['pos'] < df['neg'] ] = 'negative'
df.loc[ (df['pos'] < df['neg']), 'label' ] = 'negative'

print(df)

Result:

                         data     pos words     neg words  pos  neg     label
0  [i, like, love, hate, you]  [love, like]        [hate]    2    1  positive
1  [i, dont, hate, like, you]        [like]  [hate, dont]    1    2  negative

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM