简体   繁体   English

Python - 计算给定文本中的单词

[英]Python - Count words in a given text

I'm new to coding so forgive me if I ask something that was already answered but believe me that I did search for answer and couldn't find it. 我是新手编码所以请原谅我,如果我问一些已经回答的问题,但请相信我,我确实找到了答案而找不到答案。

I have a task do count how many of given words are in given text. 我有一个任务确定给定文本中有多少给定的单词。 Word can be a hole or part of other word. Word可以是一个洞或其他单词的一部分。 Letter case does not matter. 信件无关紧要。 If word appears several times in the text, it should be counted only once. 如果单词在文本中出现多次,则只应计算一次。 So far I managed to come to this: 到目前为止,我设法来到这个:

def count_words(text, words):
    count = 0
    text = text.lower()
    for w in words:
        if w in text:
            count =+ 1

    print (count)

count_words("How aresjfhdskfhskd you?", {"how", "are", "you", "hello"})
count_words("Bananas, give me bananas!!!", {"banana", "bananas"})
count_words("Lorem ipsum dolor sit amet, consectetuer adipiscing elit.",
                       {"sum", "hamlet", "infinity", "anything"})

With that code I manage to get final count of 1 for all tree texts and of that only third is ok. 使用该代码,我设法为所有树文本获得最终计数为1,并且只有第三个是正常的。

As I see it, my first problem is that my text.lower() doesn't do anything and I tough it should lower all cases. 正如我所看到的,我的第一个问题是我的text.lower()没有做任何事情我强硬它应该降低所有情况。

My second problem is that in first case "are" isn't found in "aresjfhdskfhskd" but in third case "sum" is found in "ipsum". 我的第二个问题是,在第一种情况下,“aresjfhdskfhskd”中没有“are”,但在第三种情况下,“sum”在“ipsum”中找到。 Both of that words are part od larger word but first isn't found and second is. 这两个词都是大词的一部分,但首先没有找到,第二个是。 Also, in second case result should be 2 because there are banana and bananas, similar but different. 此外,在第二种情况下结果应该是2因为有香蕉和香蕉,相似但不同。

Thanks in advance. 提前致谢。

Using sum and a generator expression, this seems the simplest solution: 使用sum和生成器表达式,这似乎是最简单的解决方案:

text = text.lower()
count = sum(word in text for word in words)
# bools are cast to ints (0, 1) here

First - strings are immutable, so text.lower() is not changing text itself, but returns new instance - lowercased. 首先 - 字符串是不可变的,因此text.lower()不会更改text本身,而是返回新实例 - lowercased。 Other problem is that if a in base checks if exists, without info how many times... 其他问题是, if a in base检查是否存在,没有信息多少次......

def count_words(text, words):
    count = 0
    lower_text = text.lower()
    for w in words:
        print w + " - " + str(lower_text.count(w))

print "1"
count_words("How aresjfhdskfhskd you?", {"how", "are", "you", "hello"})
print "2"
count_words("Bananas, give me bananas!!!", {"banana", "bananas"})
print "3"
count_words("Lorem ipsum dolor sit amet, consectetuer adipiscing elit.",
                   {"sum", "hamlet", "infinity", "anything"})

Your code is partially wrong. 您的代码部分错误。 Try this: 试试这个:

def count_words(text, words):
    count = 0
    lower_text = text.lower()
    for w in words:
        if w in lower_text:
            count += 1

    print count

count_words("How aresjfhdskfhskd you?", {"how", "are", "you", "hello"})
count_words("Bananas, give me bananas!!!", {"banana", "bananas"})
count_words("Lorem ipsum dolor sit amet, consectetuer adipiscing elit.",
                   {"sum", "hamlet", "infinity", "anything"})

This will only work in Python 2.7 though so if you're using Python 3+ you need to change the final print to print(count) . 这只适用于Python 2.7,但如果您使用的是Python 3+,则需要将最终print更改为print(count)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM