[英]Normalization words for sentiment analysis
I'm currently doing sentiment analysis and having a problem.我目前正在做情绪分析并遇到问题。
I have a big normalization for word and I want to normalization text before tokenize like this example:我对单词进行了很大的规范化,并且我想在标记化之前对文本进行规范化,如下例所示:
data![]() |
normal![]() |
---|---|
kamu knp sayang ![]() |
kamu kenapa sayang![]() |
drpd sedih mending belajar ![]() |
dari pada sedih mending belajar ![]() |
dmna sekarang ![]() |
di mana sekarang![]() |
This is my code:这是我的代码:
import pandas as pd
slang = pd.DataFrame({'before': ['knp', 'dmna', 'drpd'], 'after': ['kenapa', 'di mana', 'dari pada']})
df = pd.DataFrame({'data': ['kamu knp sayang', 'drpd sedih mending bermain']})
normalisasi = {}
for index, row in slang.iterrows():
if row[0] not in normalisasi:
normalisasi[row[0]] = row[1]
def normalized_term(document):
return [normalisasi[term] if term in normalisasi else term for term in document]
df['normal'] = df['data'].apply(normalized_term)
df
But, the result like this: result但是,结果是这样的:
I want the result like the example table.我想要像示例表一样的结果。
There is a utility named str.replace
in pandas that allows us to replace a substring with another or even find/replace patterns. pandas 中有一个名为
str.replace
的实用程序,它允许我们用另一个甚至查找/替换模式替换 substring。 You can find full documentation here .您可以在此处找到完整的文档。 Your desired output would have appeared like this:
您想要的 output 将如下所示:
import pandas as pd
slang = pd.DataFrame({'before': ['knp', 'dmna', 'drpd'], 'after': ['kenapa', 'di mana', 'dari pada']})
df = pd.DataFrame({'data': ['kamu knp sayang', 'drpd sedih mending bermain']})
for idx, row in slang.iterrows():
df.data = df.data.str.replace(row['before'], row['after'])
output: output:
data
0 kamu kenapa sayang
1 dari pada sedih mending bermain
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.