在python中删除情感分析中的标点符号

Question

I have the following code I made.我有以下代码。 It works great but problems arise when I add sentences with commas, full-stops etc. I've researched and can see strip() as a potential option to fix it?它工作得很好，但是当我添加带有逗号、句号等的句子时会出现问题。我已经研究过并且可以将 strip() 视为修复它的潜在选项？ I can't see where to add it and have tried but just error after error!我看不到在哪里添加它并尝试过但只是一个又一个错误！

Thanks谢谢

sent_analysis = {"beer": 10, "wine":13,"spirit": 11,"cider":16,"shot":16}

def sentiment_analysis(dic, text):
    split_text = text.split()
    result = 0.00
    for i in split_text:
        if i in dic:
            result+= dic[i]
    return result


print sentiment_analysis(sent_analysis,"the beer, wine and cider were    great")
print sentiment_analysis(sent_analysis,"the beer and the wine were great")

Answer 1

Regular expressions can be used to remove all non alpha-numeric characters from a string.正则表达式可用于从字符串中删除所有非字母数字字符。 In the code below the ^\\w\\s matches anything not (as indicated by the ^) az, AZ,0-9, and spaces, and removes them.在下面的代码中， ^\\w\\s 匹配任何不匹配（如 ^ 所示）az、AZ、0-9 和空格的内容，并将它们删除。 The return statement iterates though the split string, finding any matches, adding it to a list, then returning the sum of those numbers. return 语句遍历拆分的字符串，找到任何匹配项，将其添加到列表中，然后返回这些数字的总和。

Regex \\s 正则表达式 \\s

Regex \\w正则表达式\\w

import re
sent_analysis = {"beer": 10, "wine":13,"spirit": 11,"cider":16,"shot":16}

def sentiment_analysis(dic, text):
    result = 0.00
    s = re.sub(r'[^\w\s]','',text)
    return sum([dic[x] for x in s.split() if x in dic])

print(sentiment_analysis(sent_analysis,"the beer,% wine &*and cider @were great"))

Output: 39输出：39

This will account for most punctuation, as indicated by the many different ones added in the example string.这将解释大多数标点符号，如示例字符串中添加的许多不同的标点符号所示。

在python中删除情感分析中的标点符号

问题描述

1 个解决方案

解决方案1
1 2016-04-16 14:26:31

在python中删除情感分析中的标点符号

问题描述

1 个解决方案

解决方案1 1 2016-04-16 14:26:31

解决方案1
1 2016-04-16 14:26:31