[英]How do I remove the uninteresting words and characters with my script?
我无法弄清楚我在这里做错了什么。 这只是我项目的一部分,我试图在项目的最后一部分排除标点符号和 uninteresting_words。 我可以完整运行我的脚本,但它不会删除标点符号或 uninteresting_words。 我已经尝试将标点符号变成一个列表,但它不是一个将内容分成单个项目的列表,它现在只是一个列表,其中的所有字符都作为一个列表项。 正如您在下面的代码中看到的那样,我尝试将punctuations.split()
保存为一个名为 char 的新变量,并尝试了几种 if 循环和迭代方法来处理 file_contents 中的单词
def calculate_frequencies(file_contents): # file_contents is being passed in through another
# part of the code that comes before this def
# Here is a list of punctuations and uninteresting words you can use to process your text
punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
"we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
"their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
"have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
"all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]
# LEARNER CODE START HERE
char = punctuations.split()
result = {}
for words in file_contents.split():
if words == uninteresting_words:
pass
if words.isalnum() and words != uninteresting_words:
if words not in result:
result[words]=1
else:
result[words]+=1
print(result) # this line and the following 2 are just so i can see what how they show up
print(char)
print(uninteresting_words)
#wordcloud-this part and after is ok and is working as expected with the code that follows
cloud = wordcloud.WordCloud()
cloud.generate_from_frequencies(result)
return cloud.to_array()
正如评论所说,您应该if words in uninteresting_words:
无论如何,我认为您的输入文本不会在标点符号上的特殊字符上分裂。 list.split() 默认在空格上分割。 使用words.strip(punctuation)
一起删除标点符号。
您也不应该对字符串使用文档字符串(''',三引号)。 使用 ' 或 " 并根据需要转义其他字符。
def calculate_frequencies(file_contents): # file_contents is being passed in through another
# part of the code that comes before this def
# Here is a list of punctuations and uninteresting words you can use to process your text
punctuations = "!()-[]{};:'\"\\,<>./?@#$%^&*_~"
uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
"we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
"their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
"have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
"all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]
# LEARNER CODE START HERE
result = {}
for words in file_contents.split():
words = words.strip(punctuations)
if words in uninteresting_words:
pass
else:
if words not in result:
result[words]=1
else:
result[words]+=1
print(result) # this line and the following 2 are just so i can see what how they show up
print(punctuations)
print(uninteresting_words)
cloud = wordcloud.WordCloud()
cloud.generate_from_frequencies(result)
return cloud.to_array()
应该这样做。这是我需要的解决方案
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.