[英]How do I remove the uninteresting words and characters with my script?
I can not figure out what I'm doing wrong here.我无法弄清楚我在这里做错了什么。 This is just part of my project and I am trying to exclude punctuations and uninteresting_words for my final part of the project.这只是我项目的一部分,我试图在项目的最后一部分排除标点符号和 uninteresting_words。 I can run my script in full but it does not remove punctuation or uninteresting_words.我可以完整运行我的脚本,但它不会删除标点符号或 uninteresting_words。 I have tried turning punctuations into a list but it isn't a list of the contents separated into individual items, it's just a list now with all the characters in it as one list item.我已经尝试将标点符号变成一个列表,但它不是一个将内容分成单个项目的列表,它现在只是一个列表,其中的所有字符都作为一个列表项。 As you can see in the code below, I tried saving punctuations.split()
as a new variable called char and have tried several ways of if loops and iteration to work through words in file_contents正如您在下面的代码中看到的那样,我尝试将punctuations.split()
保存为一个名为 char 的新变量,并尝试了几种 if 循环和迭代方法来处理 file_contents 中的单词
def calculate_frequencies(file_contents): # file_contents is being passed in through another
# part of the code that comes before this def
# Here is a list of punctuations and uninteresting words you can use to process your text
punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
"we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
"their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
"have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
"all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]
# LEARNER CODE START HERE
char = punctuations.split()
result = {}
for words in file_contents.split():
if words == uninteresting_words:
pass
if words.isalnum() and words != uninteresting_words:
if words not in result:
result[words]=1
else:
result[words]+=1
print(result) # this line and the following 2 are just so i can see what how they show up
print(char)
print(uninteresting_words)
#wordcloud-this part and after is ok and is working as expected with the code that follows
cloud = wordcloud.WordCloud()
cloud.generate_from_frequencies(result)
return cloud.to_array()
As the comment said you should use if words in uninteresting_words:
正如评论所说,您应该if words in uninteresting_words:
I don't think your input text is splitting on special characters on punctuation anyway.无论如何,我认为您的输入文本不会在标点符号上的特殊字符上分裂。 list.split() splits on spaces by default. list.split() 默认在空格上分割。 Use words.strip(punctuation)
to remove the punctuation all together.使用words.strip(punctuation)
一起删除标点符号。
You should also not use a docstring (''', the trippple quote) for a string.您也不应该对字符串使用文档字符串(''',三引号)。 Use ' or " and escape other characters as needed.使用 ' 或 " 并根据需要转义其他字符。
def calculate_frequencies(file_contents): # file_contents is being passed in through another
# part of the code that comes before this def
# Here is a list of punctuations and uninteresting words you can use to process your text
punctuations = "!()-[]{};:'\"\\,<>./?@#$%^&*_~"
uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
"we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
"their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
"have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
"all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]
# LEARNER CODE START HERE
result = {}
for words in file_contents.split():
words = words.strip(punctuations)
if words in uninteresting_words:
pass
else:
if words not in result:
result[words]=1
else:
result[words]+=1
print(result) # this line and the following 2 are just so i can see what how they show up
print(punctuations)
print(uninteresting_words)
cloud = wordcloud.WordCloud()
cloud.generate_from_frequencies(result)
return cloud.to_array()
That should do it.This was the solution that i needed应该这样做。这是我需要的解决方案
https://www.python.org/dev/peps/pep-0257/ https://www.python.org/dev/peps/pep-0257/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.