[英]How do I remove the uninteresting words and characters with my script?
我無法弄清楚我在這里做錯了什么。 這只是我項目的一部分,我試圖在項目的最后一部分排除標點符號和 uninteresting_words。 我可以完整運行我的腳本,但它不會刪除標點符號或 uninteresting_words。 我已經嘗試將標點符號變成一個列表,但它不是一個將內容分成單個項目的列表,它現在只是一個列表,其中的所有字符都作為一個列表項。 正如您在下面的代碼中看到的那樣,我嘗試將punctuations.split()
保存為一個名為 char 的新變量,並嘗試了幾種 if 循環和迭代方法來處理 file_contents 中的單詞
def calculate_frequencies(file_contents): # file_contents is being passed in through another
# part of the code that comes before this def
# Here is a list of punctuations and uninteresting words you can use to process your text
punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
"we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
"their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
"have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
"all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]
# LEARNER CODE START HERE
char = punctuations.split()
result = {}
for words in file_contents.split():
if words == uninteresting_words:
pass
if words.isalnum() and words != uninteresting_words:
if words not in result:
result[words]=1
else:
result[words]+=1
print(result) # this line and the following 2 are just so i can see what how they show up
print(char)
print(uninteresting_words)
#wordcloud-this part and after is ok and is working as expected with the code that follows
cloud = wordcloud.WordCloud()
cloud.generate_from_frequencies(result)
return cloud.to_array()
正如評論所說,您應該if words in uninteresting_words:
無論如何,我認為您的輸入文本不會在標點符號上的特殊字符上分裂。 list.split() 默認在空格上分割。 使用words.strip(punctuation)
一起刪除標點符號。
您也不應該對字符串使用文檔字符串(''',三引號)。 使用 ' 或 " 並根據需要轉義其他字符。
def calculate_frequencies(file_contents): # file_contents is being passed in through another
# part of the code that comes before this def
# Here is a list of punctuations and uninteresting words you can use to process your text
punctuations = "!()-[]{};:'\"\\,<>./?@#$%^&*_~"
uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
"we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
"their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
"have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
"all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]
# LEARNER CODE START HERE
result = {}
for words in file_contents.split():
words = words.strip(punctuations)
if words in uninteresting_words:
pass
else:
if words not in result:
result[words]=1
else:
result[words]+=1
print(result) # this line and the following 2 are just so i can see what how they show up
print(punctuations)
print(uninteresting_words)
cloud = wordcloud.WordCloud()
cloud.generate_from_frequencies(result)
return cloud.to_array()
應該這樣做。這是我需要的解決方案
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.