简体   繁体   English

用于情感分析的规范化词

[英]Normalization words for sentiment analysis

I'm currently doing sentiment analysis and having a problem.我目前正在做情绪分析并遇到问题。

I have a big normalization for word and I want to normalization text before tokenize like this example:我对单词进行了很大的规范化,并且我想在标记化之前对文本进行规范化,如下例所示:

data数据 normal普通的
kamu knp sayang kamu knp sayang kamu kenapa sayang卡姆卡纳帕萨扬
drpd sedih mending belajar drpd sedih 修补贝拉哈尔 dari pada sedih mending belajar dari pada sedih 修补贝拉哈尔
dmna sekarang dmna sekarang di mana sekarang迪马纳塞卡朗
  • knp: kenapa knp:肯纳帕
  • drpd: dari pada drpd:达里帕达
  • dmna: di mana dmna: 迪马纳

This is my code:这是我的代码:

import pandas as pd

slang = pd.DataFrame({'before': ['knp', 'dmna', 'drpd'], 'after': ['kenapa', 'di mana', 'dari pada']})
df = pd.DataFrame({'data': ['kamu knp sayang', 'drpd sedih mending bermain']})
                  
normalisasi = {}

for index, row in slang.iterrows():
  if row[0] not in normalisasi:
    normalisasi[row[0]] = row[1]


def normalized_term(document):
    return [normalisasi[term] if term in normalisasi else term for term in document]

df['normal'] = df['data'].apply(normalized_term)
df

But, the result like this: result但是,结果是这样的:

I want the result like the example table.我想要像示例表一样的结果。

There is a utility named str.replace in pandas that allows us to replace a substring with another or even find/replace patterns. pandas 中有一个名为str.replace的实用程序,它允许我们用另一个甚至查找/替换模式替换 substring。 You can find full documentation here .您可以在此处找到完整的文档。 Your desired output would have appeared like this:您想要的 output 将如下所示:

import pandas as pd
slang = pd.DataFrame({'before': ['knp', 'dmna', 'drpd'], 'after': ['kenapa', 'di mana', 'dari pada']})
df = pd.DataFrame({'data': ['kamu knp sayang', 'drpd sedih mending bermain']})
for idx, row in slang.iterrows():
    df.data = df.data.str.replace(row['before'], row['after']) 

output: output:

                              data
0               kamu kenapa sayang
1  dari pada sedih mending bermain

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM