简体   繁体   English

如何使用我的脚本删除无趣的单词和字符?

[英]How do I remove the uninteresting words and characters with my script?

I can not figure out what I'm doing wrong here.我无法弄清楚我在这里做错了什么。 This is just part of my project and I am trying to exclude punctuations and uninteresting_words for my final part of the project.这只是我项目的一部分,我试图在项目的最后一部分排除标点符号和 uninteresting_words。 I can run my script in full but it does not remove punctuation or uninteresting_words.我可以完整运行我的脚本,但它不会删除标点符号或 uninteresting_words。 I have tried turning punctuations into a list but it isn't a list of the contents separated into individual items, it's just a list now with all the characters in it as one list item.我已经尝试将标点符号变成一个列表,但它不是一个将内容分成单个项目的列表,它现在只是一个列表,其中的所有字符都作为一个列表项。 As you can see in the code below, I tried saving punctuations.split() as a new variable called char and have tried several ways of if loops and iteration to work through words in file_contents正如您在下面的代码中看到的那样,我尝试将punctuations.split()保存为一个名为 char 的新变量,并尝试了几种 if 循环和迭代方法来处理 file_contents 中的单词


def calculate_frequencies(file_contents):   # file_contents is being passed in through another 
                                            # part of the code that comes before this def
    # Here is a list of punctuations and uninteresting words you can use to process your text
    punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
    uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
    "we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
    "their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
    "have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
    "all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]
    
    # LEARNER CODE START HERE
    char = punctuations.split()
    result = {}
    for words in file_contents.split():
      if words == uninteresting_words:
        pass
      if words.isalnum() and words != uninteresting_words:
        if words not in result:
            result[words]=1
        else:
            result[words]+=1
            
    print(result) # this line and the following 2 are just so i can see what how they show up
    print(char)
    print(uninteresting_words)
    
    
    #wordcloud-this part and after is ok and is working as expected with the code that follows 
    cloud = wordcloud.WordCloud()
    cloud.generate_from_frequencies(result)
    return cloud.to_array()

As the comment said you should use if words in uninteresting_words:正如评论所说,您应该if words in uninteresting_words:

I don't think your input text is splitting on special characters on punctuation anyway.无论如何,我认为您的输入文本不会在标点符号上的特殊字符上分裂。 list.split() splits on spaces by default. list.split() 默认在空格上分割。 Use words.strip(punctuation) to remove the punctuation all together.使用words.strip(punctuation)一起删除标点符号。

You should also not use a docstring (''', the trippple quote) for a string.您也不应该对字符串使用文档字符串(''',三引号)。 Use ' or " and escape other characters as needed.使用 ' 或 " 并根据需要转义其他字符。


def calculate_frequencies(file_contents):   # file_contents is being passed in through another 
                                            # part of the code that comes before this def
    # Here is a list of punctuations and uninteresting words you can use to process your text
    punctuations = "!()-[]{};:'\"\\,<>./?@#$%^&*_~"
    uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
    "we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
    "their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
    "have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
    "all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]
    
    # LEARNER CODE START HERE
    result = {}
    for words in file_contents.split():
      words = words.strip(punctuations)
      if words in uninteresting_words:
        pass
      else:
        if words not in result:
            result[words]=1
        else:
            result[words]+=1
            
    print(result) # this line and the following 2 are just so i can see what how they show up
    print(punctuations)
    print(uninteresting_words)
    
    cloud = wordcloud.WordCloud()
    cloud.generate_from_frequencies(result)
    return cloud.to_array()

That should do it.This was the solution that i needed应该这样做。这是我需要的解决方案

https://www.python.org/dev/peps/pep-0257/ https://www.python.org/dev/peps/pep-0257/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从wordcloud中删除单词? (Python 3) - How do I remove words from my wordcloud? (Python 3) 如何让我的脚本很好地拒绝无效字符 - How do i get my script to nicely reject invalid characters 如何使用 MATLAB 函数从非均匀波形中消除“无意义”的波形部分? - How do I eliminate “uninteresting” parts of waveform from a non-uniform waveform using MATLAB functions? 如何让我的代码区分单词和单数字符? (Python) - How do I make my code differentiate between words and singular characters? (Python) 如何删除少于3个字符的单词? - How can I remove words with less than 3 characters? 如何删除以特定字符集开头的句子中的单词? - How can I remove words in a sentence starting with a particular set of characters? 如何排除 8 个字符以下的单词? - How do i exclude words under 8 characters length? 如何删除字符串中 2 个不同字符之间的字符 - how do i remove characters between 2 different characters inside a string 如何在读取文件但删除一个变量然后替换它时删除“\n”字符 - How do I remove the “\n” characters when reading my file but deleting one variable then replacing it 正则表达式:如何删除不是单词的单个字符? - Regex: how to remove single characters that are not words?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM