如何在Python中使用for循環從字符串中打印每個唯一單詞的頻率

Question

該段旨在包含空格和隨機標點符號，我通過執行.replace將其移至我的for循環中。 然后，我通過.split（）將段落放入列表中，以獲得['the'，'title'，'etc']。 然后，我使兩個函數對單詞進行計數以對每個單詞進行計數，但是我不想讓它對每個單詞進行計數，因此我使另一個函數創建了一個唯一列表。 但是，我需要創建一個for循環以打印出每個單詞以及輸出了多少次這樣的輸出

The word The appears 2 times in the paragraph.
The word titled appears 1 times in the paragraph.
The word track appears 1 times in the paragraph.

我也很難理解for循環的本質功能。 我讀到，我們應該只使用for循環進行計數，而while循環進行任何其他操作，而while循環也可以用於計數。

    paragraph = """  The titled track “Heart Attack” does not interpret the 
    feelings of being in love in a serious way, 
    but with Chuu’s own adorable emoticon like ways. The music video has 
    references to historical and fictional 
    figures such as the artist Rene Magritte!!....  """


for r in ((",", ""), ("!", ""), (".", ""), ("  ", "")):
    paragraph = paragraph.replace(*r)

paragraph_list = paragraph.split()


def count_words(word, word_list):

    word_count = 0
    for i in range(len(word_list)):
        if word_list[i] == word:
            word_count += 1
    return word_count

def unique(word):
    result = []
    for f in word:
        if f not in result:
            result.append(f)
    return result
unique_list = unique(paragraph_list)

Answer 1

如果您使用的是更好的re和get一個默認值：

paragraph = """  The titled track “Heart Attack” does not interpret the
feelings of being in love in a serious way,
but with Chuu’s own adorable emoticon like ways. The music video has
references to historical and fictional
figures such as the artist Rene Magritte!!....  c c c c c c c ccc"""

import re

word_count = {}
for w in re.split(' |,|“|”|!|\?|\.|\n', paragraph.lower()):
    word_count[w] = word_count.get(w, 0) + 1
del word_count['']

for k, v in word_count.items():
    print("The word {} appears {} time(s) in the paragraph".format(k, v))

輸出：

The word the appears 4 time(s) in the paragraph
The word titled appears 1 time(s) in the paragraph
The word track appears 1 time(s) in the paragraph
...

與Chuu's關系是可以討論Chuu's ，我決定不拆分為'但是如果需要，您可以稍后添加。

更新：

下面的行使用正則表達式對paragraph.lower()進行拆分。 好處是您可以描述多個分隔符

re.split(' |,|“|”|!|\?|\.|\n', paragraph.lower()

關於這條線：

word_count[w] = word_count.get(w, 0) + 1

word_count是一本字典。 使用get的好處是，如果w不在字典中，則可以定義一個默認值。 該行基本上更新單詞w的計數

Answer 2

當心，示例文本很簡單，但標點規則可能很復雜，或者沒有正確遵守。 文本包含2個相鄰空格是什么（是的，它不正確但很頻繁）？ 如果作家更習慣法語，並在冒號或分號之前和之后寫空格怎么辦？

我認為's構造需要特殊處理。 那怎么辦： """John has a bicycle. Mary says that her one is nicer that John's."""恕我直言， John一詞在這里出現過兩次，而您的算法將看到1個John和1個Johns 。

另外，由於Unicode文本現在在WEB頁面上很常見，因此您應該准備好尋找與空格和標點符號等價的代碼：

“ U+201C LEFT DOUBLE QUOTATION MARK
” U+201D RIGHT DOUBLE QUOTATION MARK
’ U+2019 RIGHT SINGLE QUOTATION MARK
‘ U+2018 LEFT SINGLE QUOTATION MARK
  U+00A0 NO-BREAK SPACE

另外，根據這個較早的問題，去除標點的最佳方法是translate 。 鏈接的問題使用Python 2語法，但是在Python 3中，您可以執行以下操作：

paragraph = paragraph.strip()                   # remove initial and terminal white spaces
paragraph = paragraph.translate(str.maketrans('“”’‘\xa0', '""\'\' '))  # fix high code punctuations
paragraph = re.replace("\w's\s", "", paragraph)  # remove 's
paragraph = paragraph.translate(str.maketrans(None, None, string.punctuation) # remove punctuations
words = paragraph.split()

Answer 3

請嘗試以下方法：

paragraph = """  The titled track “Heart Attack” does not interpret the 
feelings of being in love in a serious way, 
but with Chuu’s own adorable emoticon like ways. The music video has 
references to historical and fictional 
figures such as the artist Rene Magritte!!....  c c c c c c c ccc"""

characterToRemove = (",","!",".","?",'“','”')
for i in paragraph:
    if i in characterToRemove:
         paragraph = paragraph.replace(i,"")

paragraph=paragraph.split()
uniqueWords=set(paragraph)
dictionartWords={}
for i in uniqueWords:
    dictionartWords[i]=0

for i in paragraph:
    if i in dictionartWords.keys():
        dictionartWords[i]+=1

如此一來，您會得到字典，其中包含唯一詞作為鍵和數字值，該數字和數字值指示段落中每個唯一詞的數量：

 print(dictionartWords)

{'The'：2，'like'：1，'serious'：1，'titled'：1，'Rene'：1，'a'：1，'artist'：1，'video'：1，' c'：7，'with'：1，'track'：1，'to'：1，'fictional'：1，'feelings'：1，'ccc'：1，'but'：1，'not' ：1，'has'：1，'解釋'：1，'way'：1，'as'：1，'of'：1，'表情符號'：1，'Heart'：1，'in'：2 ，“可愛”：1，“愛”：1，“引薦”：1，“存在”：1，“馬格利特”：1，“ Chuu's”：1，“歷史”：1，“此類”：1，“和'：1，'does'：1，'music'：1，'the'：2，'figures'：1，'Attack'：1，'own'：1，'ways'：1}

如何在Python中使用for循環從字符串中打印每個唯一單詞的頻率

問題描述

3 個解決方案

解決方案1
3 已采納 2018-10-10 07:36:17

解決方案2
0 2018-10-10 08:27:42

解決方案3
-1 2018-10-10 07:22:38

如何在Python中使用for循環從字符串中打印每個唯一單詞的頻率

問題描述

3 個解決方案

解決方案1 3 已采納 2018-10-10 07:36:17

解決方案2 0 2018-10-10 08:27:42

解決方案3 -1 2018-10-10 07:22:38

解決方案1
3 已采納 2018-10-10 07:36:17

解決方案2
0 2018-10-10 08:27:42

解決方案3
-1 2018-10-10 07:22:38