简体   繁体   English

如何在Python中使用for循环从字符串中打印每个唯一单词的频率

[英]How to print frequency of each unique word from a string with for loop in python

The paragraph is meant to have spaces and random punctuation, I removed them in my for loop, by doing .replace. 该段旨在包含空格和随机标点符号,我通过执行.replace将其移至我的for循环中。 Then I made paragraph into a list by .split() to get ['the', 'title', 'etc']. 然后,我通过.split()将段落放入列表中,以获得['the','title','etc']。 Then I made two functions count words to count each word but I didn't want it to count every word, so I made another function to create a unique list. 然后,我使两个函数对单词进行计数以对每个单词进行计数,但是我不想让它对每个单词进行计数,因此我使另一个函数创建了一个唯一列表。 However, I need to create a for loop to print out each word and how many times it been said with the output being something like this 但是,我需要创建一个for循环以打印出每个单词以及输出了多少次这样的输出

The word The appears 2 times in the paragraph.
The word titled appears 1 times in the paragraph.
The word track appears 1 times in the paragraph.

I also have a hard time understanding what a for loop essentially does. 我也很难理解for循环的本质功能。 I read that we should just be using for loops for counting, and while loops for any other things but a while loop can also be used for counting. 我读到,我们应该只使用for循环进行计数,而while循环进行任何其他操作,而while循环也可以用于计数。

    paragraph = """  The titled track “Heart Attack” does not interpret the 
    feelings of being in love in a serious way, 
    but with Chuu’s own adorable emoticon like ways. The music video has 
    references to historical and fictional 
    figures such as the artist Rene Magritte!!....  """


for r in ((",", ""), ("!", ""), (".", ""), ("  ", "")):
    paragraph = paragraph.replace(*r)

paragraph_list = paragraph.split()


def count_words(word, word_list):

    word_count = 0
    for i in range(len(word_list)):
        if word_list[i] == word:
            word_count += 1
    return word_count

def unique(word):
    result = []
    for f in word:
        if f not in result:
            result.append(f)
    return result
unique_list = unique(paragraph_list)

It is better if you use re and get with a default value: 如果您使用的是更好的reget一个默认值:

paragraph = """  The titled track “Heart Attack” does not interpret the
feelings of being in love in a serious way,
but with Chuu’s own adorable emoticon like ways. The music video has
references to historical and fictional
figures such as the artist Rene Magritte!!....  c c c c c c c ccc"""

import re

word_count = {}
for w in re.split(' |,|“|”|!|\?|\.|\n', paragraph.lower()):
    word_count[w] = word_count.get(w, 0) + 1
del word_count['']

for k, v in word_count.items():
    print("The word {} appears {} time(s) in the paragraph".format(k, v))

Output: 输出:

The word the appears 4 time(s) in the paragraph
The word titled appears 1 time(s) in the paragraph
The word track appears 1 time(s) in the paragraph
...

It is discussible what to do with Chuu's , I decided not to split in ' but you can add that later if you want. Chuu's关系是可以讨论Chuu's ,我决定不拆分为'但是如果需要,您可以稍后添加。

Update: 更新:

The following line splits paragraph.lower() using a regular expression. 下面的行使用正则表达式对paragraph.lower()进行拆分。 The advantage is that you can describe multiple separators 好处是您可以描述多个分隔符

re.split(' |,|“|”|!|\?|\.|\n', paragraph.lower()

With respect to this line: 关于这条线:

word_count[w] = word_count.get(w, 0) + 1

word_count is a dictionary. word_count是一本字典。 The advantage of using get is that you can define a default value in case w is not in the dictionary yet. 使用get的好处是,如果w不在字典中,则可以定义一个默认值。 The line basically updates the count for word w 该行基本上更新单词w的计数

Beware, your example text is simple but punctuation rules can be complex or not correctly observed. 当心,示例文本很简单,但标点规则可能很复杂,或者没有正确遵守。 What is the text contains 2 adjacent spaces (yes it is incorrect but frequent)? 文本包含2个相邻空格是什么(是的,它不正确但很频繁)? What if the writer is more used to French and writes spaces before and after a colon or semicolon? 如果作家更习惯法语,并在冒号或分号之前和之后写空格怎么办?

I think the 's construct need special processing. 我认为's构造需要特殊处理。 What about: """John has a bicycle. Mary says that her one is nicer that John's.""" IMHO the word John occurs twice here, while your algo will see 1 John and 1 Johns . 那怎么办: """John has a bicycle. Mary says that her one is nicer that John's."""恕我直言, John一词在这里出现过两次,而您的算法将看到1个John和1个Johns

Additionaly as Unicode text is now common on WEB pages, you should be prepared to find high code equivalents of spaces and punctuations: 另外,由于Unicode文本现在在WEB页面上很常见,因此您应该准备好寻找与空格和标点符号等价的代码:

“ U+201C LEFT DOUBLE QUOTATION MARK
” U+201D RIGHT DOUBLE QUOTATION MARK
’ U+2019 RIGHT SINGLE QUOTATION MARK
‘ U+2018 LEFT SINGLE QUOTATION MARK
  U+00A0 NO-BREAK SPACE

In addition, according to this older question to best way to remove punctuation is translate . 另外,根据这个较早的问题 ,去除标点的最佳方法是translate Linked question used Python 2 syntax, but in Python 3 you can do: 链接的问题使用Python 2语法,但是在Python 3中,您可以执行以下操作:

paragraph = paragraph.strip()                   # remove initial and terminal white spaces
paragraph = paragraph.translate(str.maketrans('“”’‘\xa0', '""\'\' '))  # fix high code punctuations
paragraph = re.replace("\w's\s", "", paragraph)  # remove 's
paragraph = paragraph.translate(str.maketrans(None, None, string.punctuation) # remove punctuations
words = paragraph.split()

Plese try this one: 请尝试以下方法:

paragraph = """  The titled track “Heart Attack” does not interpret the 
feelings of being in love in a serious way, 
but with Chuu’s own adorable emoticon like ways. The music video has 
references to historical and fictional 
figures such as the artist Rene Magritte!!....  c c c c c c c ccc"""

characterToRemove = (",","!",".","?",'“','”')
for i in paragraph:
    if i in characterToRemove:
         paragraph = paragraph.replace(i,"")

paragraph=paragraph.split()
uniqueWords=set(paragraph)
dictionartWords={}
for i in uniqueWords:
    dictionartWords[i]=0

for i in paragraph:
    if i in dictionartWords.keys():
        dictionartWords[i]+=1

As a result you get dictionary wich cintains unique words as a key and digit value which indicates number of each unique words in the paragraph: 如此一来,您会得到字典,其中包含唯一词作为键和数字值,该数字和数字值指示段落中每个唯一词的数量:

 print(dictionartWords)

{'The': 2, 'like': 1, 'serious': 1, 'titled': 1, 'Rene': 1, 'a': 1, 'artist': 1, 'video': 1, 'c': 7, 'with': 1, 'track': 1, 'to': 1, 'fictional': 1, 'feelings': 1, 'ccc': 1, 'but': 1, 'not': 1, 'has': 1, 'interpret': 1, 'way': 1, 'as': 1, 'of': 1, 'emoticon': 1, 'Heart': 1, 'in': 2, 'adorable': 1, 'love': 1, 'references': 1, 'being': 1, 'Magritte': 1, 'Chuu's': 1, 'historical': 1, 'such': 1, 'and': 1, 'does': 1, 'music': 1, 'the': 2, 'figures': 1, 'Attack': 1, 'own': 1, 'ways': 1} {'The':2,'like':1,'serious':1,'titled':1,'Rene':1,'a':1,'artist':1,'video':1,' c':7,'with':1,'track':1,'to':1,'fictional':1,'feelings':1,'ccc':1,'but':1,'not' :1,'has':1,'解释':1,'way':1,'as':1,'of':1,'表情符号':1,'Heart':1,'in':2 ,“可爱”:1,“爱”:1,“引荐”:1,“存在”:1,“马格利特”:1,“ Chuu's”:1,“历史”:1,“此类”:1,“和':1,'does':1,'music':1,'the':2,'figures':1,'Attack':1,'own':1,'ways':1}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从每个 for 循环中打印唯一值? - How to print unique values from each for loop? Python不会从字符串打印唯一单词的多个位置 - Python does not print unique word's multiple positions from string Python - 如何从包含字典的字典的列中计算每个唯一键的频率? - Python - How to count the frequency of each unique key from a column containing a dictionary of dictionaries? 如何获取计数器以从输入文本文件的每一行到输出文本文件的相应行上打印唯一单词的频率? - How to get Counter to print frequency of unique words from each line of the input text file to the corresponding line on the output text file? 如何在php或python中创建每个单词出现频率的单词词典 - How to create a dictionary of words with frequency of each word in php or python python打印带有常用词或频率词的句子? - python print sentences with common word or frequency words? 从字符串中获取单词的频率 - Get the frequency of a word from a string 从python中的字符串打印一个单词 - Print one word from a string in python 我如何从python中的字符串中的每个单词中提取数字 - how do i extract numbers from each word in a string in python 在两个列表python中找到每个单词的频率 - find the frequency of each word in two list python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM