如何使用我的脚本删除无趣的单词和字符？

Question

我无法弄清楚我在这里做错了什么。 这只是我项目的一部分，我试图在项目的最后一部分排除标点符号和 uninteresting_words。 我可以完整运行我的脚本，但它不会删除标点符号或 uninteresting_words。 我已经尝试将标点符号变成一个列表，但它不是一个将内容分成单个项目的列表，它现在只是一个列表，其中的所有字符都作为一个列表项。 正如您在下面的代码中看到的那样，我尝试将punctuations.split()保存为一个名为 char 的新变量，并尝试了几种 if 循环和迭代方法来处理 file_contents 中的单词


def calculate_frequencies(file_contents):   # file_contents is being passed in through another 
                                            # part of the code that comes before this def
    # Here is a list of punctuations and uninteresting words you can use to process your text
    punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
    uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
    "we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
    "their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
    "have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
    "all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]
    
    # LEARNER CODE START HERE
    char = punctuations.split()
    result = {}
    for words in file_contents.split():
      if words == uninteresting_words:
        pass
      if words.isalnum() and words != uninteresting_words:
        if words not in result:
            result[words]=1
        else:
            result[words]+=1
            
    print(result) # this line and the following 2 are just so i can see what how they show up
    print(char)
    print(uninteresting_words)
    
    
    #wordcloud-this part and after is ok and is working as expected with the code that follows 
    cloud = wordcloud.WordCloud()
    cloud.generate_from_frequencies(result)
    return cloud.to_array()

Answer 1

正如评论所说，您应该if words in uninteresting_words:

无论如何，我认为您的输入文本不会在标点符号上的特殊字符上分裂。 list.split() 默认在空格上分割。 使用words.strip(punctuation)一起删除标点符号。

您也不应该对字符串使用文档字符串（'''，三引号）。 使用 ' 或 " 并根据需要转义其他字符。


def calculate_frequencies(file_contents):   # file_contents is being passed in through another 
                                            # part of the code that comes before this def
    # Here is a list of punctuations and uninteresting words you can use to process your text
    punctuations = "!()-[]{};:'\"\\,<>./?@#$%^&*_~"
    uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
    "we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
    "their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
    "have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
    "all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]
    
    # LEARNER CODE START HERE
    result = {}
    for words in file_contents.split():
      words = words.strip(punctuations)
      if words in uninteresting_words:
        pass
      else:
        if words not in result:
            result[words]=1
        else:
            result[words]+=1
            
    print(result) # this line and the following 2 are just so i can see what how they show up
    print(punctuations)
    print(uninteresting_words)
    
    cloud = wordcloud.WordCloud()
    cloud.generate_from_frequencies(result)
    return cloud.to_array()

应该这样做。这是我需要的解决方案

https://www.python.org/dev/peps/pep-0257/

如何使用我的脚本删除无趣的单词和字符？

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-12-03 19:55:44

如何使用我的脚本删除无趣的单词和字符？

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-12-03 19:55:44

解决方案1
0 已采纳 2020-12-03 19:55:44