如何在Python中使用for循环从字符串中打印每个唯一单词的频率

Question

The paragraph is meant to have spaces and random punctuation, I removed them in my for loop, by doing .replace. 该段旨在包含空格和随机标点符号，我通过执行.replace将其移至我的for循环中。 Then I made paragraph into a list by .split() to get ['the', 'title', 'etc']. 然后，我通过.split（）将段落放入列表中，以获得['the'，'title'，'etc']。 Then I made two functions count words to count each word but I didn't want it to count every word, so I made another function to create a unique list. 然后，我使两个函数对单词进行计数以对每个单词进行计数，但是我不想让它对每个单词进行计数，因此我使另一个函数创建了一个唯一列表。 However, I need to create a for loop to print out each word and how many times it been said with the output being something like this 但是，我需要创建一个for循环以打印出每个单词以及输出了多少次这样的输出

The word The appears 2 times in the paragraph.
The word titled appears 1 times in the paragraph.
The word track appears 1 times in the paragraph.

I also have a hard time understanding what a for loop essentially does. 我也很难理解for循环的本质功能。 I read that we should just be using for loops for counting, and while loops for any other things but a while loop can also be used for counting. 我读到，我们应该只使用for循环进行计数，而while循环进行任何其他操作，而while循环也可以用于计数。

    paragraph = """  The titled track “Heart Attack” does not interpret the 
    feelings of being in love in a serious way, 
    but with Chuu’s own adorable emoticon like ways. The music video has 
    references to historical and fictional 
    figures such as the artist Rene Magritte!!....  """


for r in ((",", ""), ("!", ""), (".", ""), ("  ", "")):
    paragraph = paragraph.replace(*r)

paragraph_list = paragraph.split()


def count_words(word, word_list):

    word_count = 0
    for i in range(len(word_list)):
        if word_list[i] == word:
            word_count += 1
    return word_count

def unique(word):
    result = []
    for f in word:
        if f not in result:
            result.append(f)
    return result
unique_list = unique(paragraph_list)

Answer 1

It is better if you use re and get with a default value: 如果您使用的是更好的re和get一个默认值：

paragraph = """  The titled track “Heart Attack” does not interpret the
feelings of being in love in a serious way,
but with Chuu’s own adorable emoticon like ways. The music video has
references to historical and fictional
figures such as the artist Rene Magritte!!....  c c c c c c c ccc"""

import re

word_count = {}
for w in re.split(' |,|“|”|!|\?|\.|\n', paragraph.lower()):
    word_count[w] = word_count.get(w, 0) + 1
del word_count['']

for k, v in word_count.items():
    print("The word {} appears {} time(s) in the paragraph".format(k, v))

Output: 输出：

The word the appears 4 time(s) in the paragraph
The word titled appears 1 time(s) in the paragraph
The word track appears 1 time(s) in the paragraph
...

It is discussible what to do with Chuu's , I decided not to split in ' but you can add that later if you want. 与Chuu's关系是可以讨论Chuu's ，我决定不拆分为'但是如果需要，您可以稍后添加。

Update: 更新：

The following line splits paragraph.lower() using a regular expression. 下面的行使用正则表达式对paragraph.lower()进行拆分。 The advantage is that you can describe multiple separators 好处是您可以描述多个分隔符

re.split(' |,|“|”|!|\?|\.|\n', paragraph.lower()

With respect to this line: 关于这条线：

word_count[w] = word_count.get(w, 0) + 1

word_count is a dictionary. word_count是一本字典。 The advantage of using get is that you can define a default value in case w is not in the dictionary yet. 使用get的好处是，如果w不在字典中，则可以定义一个默认值。 The line basically updates the count for word w 该行基本上更新单词w的计数

Answer 2

Beware, your example text is simple but punctuation rules can be complex or not correctly observed. 当心，示例文本很简单，但标点规则可能很复杂，或者没有正确遵守。 What is the text contains 2 adjacent spaces (yes it is incorrect but frequent)? 文本包含2个相邻空格是什么（是的，它不正确但很频繁）？ What if the writer is more used to French and writes spaces before and after a colon or semicolon? 如果作家更习惯法语，并在冒号或分号之前和之后写空格怎么办？

I think the 's construct need special processing. 我认为's构造需要特殊处理。 What about: """John has a bicycle. Mary says that her one is nicer that John's.""" IMHO the word John occurs twice here, while your algo will see 1 John and 1 Johns . 那怎么办： """John has a bicycle. Mary says that her one is nicer that John's."""恕我直言， John一词在这里出现过两次，而您的算法将看到1个John和1个Johns 。

Additionaly as Unicode text is now common on WEB pages, you should be prepared to find high code equivalents of spaces and punctuations: 另外，由于Unicode文本现在在WEB页面上很常见，因此您应该准备好寻找与空格和标点符号等价的代码：

“ U+201C LEFT DOUBLE QUOTATION MARK
” U+201D RIGHT DOUBLE QUOTATION MARK
’ U+2019 RIGHT SINGLE QUOTATION MARK
‘ U+2018 LEFT SINGLE QUOTATION MARK
  U+00A0 NO-BREAK SPACE

In addition, according to this older question to best way to remove punctuation is translate . 另外，根据这个较早的问题，去除标点的最佳方法是translate 。 Linked question used Python 2 syntax, but in Python 3 you can do: 链接的问题使用Python 2语法，但是在Python 3中，您可以执行以下操作：

paragraph = paragraph.strip()                   # remove initial and terminal white spaces
paragraph = paragraph.translate(str.maketrans('“”’‘\xa0', '""\'\' '))  # fix high code punctuations
paragraph = re.replace("\w's\s", "", paragraph)  # remove 's
paragraph = paragraph.translate(str.maketrans(None, None, string.punctuation) # remove punctuations
words = paragraph.split()

Answer 3

Plese try this one: 请尝试以下方法：

paragraph = """  The titled track “Heart Attack” does not interpret the 
feelings of being in love in a serious way, 
but with Chuu’s own adorable emoticon like ways. The music video has 
references to historical and fictional 
figures such as the artist Rene Magritte!!....  c c c c c c c ccc"""

characterToRemove = (",","!",".","?",'“','”')
for i in paragraph:
    if i in characterToRemove:
         paragraph = paragraph.replace(i,"")

paragraph=paragraph.split()
uniqueWords=set(paragraph)
dictionartWords={}
for i in uniqueWords:
    dictionartWords[i]=0

for i in paragraph:
    if i in dictionartWords.keys():
        dictionartWords[i]+=1

As a result you get dictionary wich cintains unique words as a key and digit value which indicates number of each unique words in the paragraph: 如此一来，您会得到字典，其中包含唯一词作为键和数字值，该数字和数字值指示段落中每个唯一词的数量：

 print(dictionartWords)

{'The': 2, 'like': 1, 'serious': 1, 'titled': 1, 'Rene': 1, 'a': 1, 'artist': 1, 'video': 1, 'c': 7, 'with': 1, 'track': 1, 'to': 1, 'fictional': 1, 'feelings': 1, 'ccc': 1, 'but': 1, 'not': 1, 'has': 1, 'interpret': 1, 'way': 1, 'as': 1, 'of': 1, 'emoticon': 1, 'Heart': 1, 'in': 2, 'adorable': 1, 'love': 1, 'references': 1, 'being': 1, 'Magritte': 1, 'Chuu's': 1, 'historical': 1, 'such': 1, 'and': 1, 'does': 1, 'music': 1, 'the': 2, 'figures': 1, 'Attack': 1, 'own': 1, 'ways': 1} {'The'：2，'like'：1，'serious'：1，'titled'：1，'Rene'：1，'a'：1，'artist'：1，'video'：1，' c'：7，'with'：1，'track'：1，'to'：1，'fictional'：1，'feelings'：1，'ccc'：1，'but'：1，'not' ：1，'has'：1，'解释'：1，'way'：1，'as'：1，'of'：1，'表情符号'：1，'Heart'：1，'in'：2 ，“可爱”：1，“爱”：1，“引荐”：1，“存在”：1，“马格利特”：1，“ Chuu's”：1，“历史”：1，“此类”：1，“和'：1，'does'：1，'music'：1，'the'：2，'figures'：1，'Attack'：1，'own'：1，'ways'：1}

如何在Python中使用for循环从字符串中打印每个唯一单词的频率

问题描述

3 个解决方案

解决方案1
3 已采纳 2018-10-10 07:36:17

解决方案2
0 2018-10-10 08:27:42

解决方案3
-1 2018-10-10 07:22:38

如何在Python中使用for循环从字符串中打印每个唯一单词的频率

问题描述

3 个解决方案

解决方案1 3 已采纳 2018-10-10 07:36:17

解决方案2 0 2018-10-10 08:27:42

解决方案3 -1 2018-10-10 07:22:38

解决方案1
3 已采纳 2018-10-10 07:36:17

解决方案2
0 2018-10-10 08:27:42

解决方案3
-1 2018-10-10 07:22:38