簡體   English   中英

如何使用我的腳本刪除無趣的單詞和字符?

[英]How do I remove the uninteresting words and characters with my script?

我無法弄清楚我在這里做錯了什么。 這只是我項目的一部分,我試圖在項目的最后一部分排除標點符號和 uninteresting_words。 我可以完整運行我的腳本,但它不會刪除標點符號或 uninteresting_words。 我已經嘗試將標點符號變成一個列表,但它不是一個將內容分成單個項目的列表,它現在只是一個列表,其中的所有字符都作為一個列表項。 正如您在下面的代碼中看到的那樣,我嘗試將punctuations.split()保存為一個名為 char 的新變量,並嘗試了幾種 if 循環和迭代方法來處理 file_contents 中的單詞


def calculate_frequencies(file_contents):   # file_contents is being passed in through another 
                                            # part of the code that comes before this def
    # Here is a list of punctuations and uninteresting words you can use to process your text
    punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
    uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
    "we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
    "their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
    "have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
    "all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]
    
    # LEARNER CODE START HERE
    char = punctuations.split()
    result = {}
    for words in file_contents.split():
      if words == uninteresting_words:
        pass
      if words.isalnum() and words != uninteresting_words:
        if words not in result:
            result[words]=1
        else:
            result[words]+=1
            
    print(result) # this line and the following 2 are just so i can see what how they show up
    print(char)
    print(uninteresting_words)
    
    
    #wordcloud-this part and after is ok and is working as expected with the code that follows 
    cloud = wordcloud.WordCloud()
    cloud.generate_from_frequencies(result)
    return cloud.to_array()

正如評論所說,您應該if words in uninteresting_words:

無論如何,我認為您的輸入文本不會在標點符號上的特殊字符上分裂。 list.split() 默認在空格上分割。 使用words.strip(punctuation)一起刪除標點符號。

您也不應該對字符串使用文檔字符串(''',三引號)。 使用 ' 或 " 並根據需要轉義其他字符。


def calculate_frequencies(file_contents):   # file_contents is being passed in through another 
                                            # part of the code that comes before this def
    # Here is a list of punctuations and uninteresting words you can use to process your text
    punctuations = "!()-[]{};:'\"\\,<>./?@#$%^&*_~"
    uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
    "we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
    "their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
    "have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
    "all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]
    
    # LEARNER CODE START HERE
    result = {}
    for words in file_contents.split():
      words = words.strip(punctuations)
      if words in uninteresting_words:
        pass
      else:
        if words not in result:
            result[words]=1
        else:
            result[words]+=1
            
    print(result) # this line and the following 2 are just so i can see what how they show up
    print(punctuations)
    print(uninteresting_words)
    
    cloud = wordcloud.WordCloud()
    cloud.generate_from_frequencies(result)
    return cloud.to_array()

應該這樣做。這是我需要的解決方案

https://www.python.org/dev/peps/pep-0257/

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM