Python - 计算给定文本中的单词

Question

I'm new to coding so forgive me if I ask something that was already answered but believe me that I did search for answer and couldn't find it. 我是新手编码所以请原谅我，如果我问一些已经回答的问题，但请相信我，我确实找到了答案而找不到答案。

I have a task do count how many of given words are in given text. 我有一个任务确定给定文本中有多少给定的单词。 Word can be a hole or part of other word. Word可以是一个洞或其他单词的一部分。 Letter case does not matter. 信件无关紧要。 If word appears several times in the text, it should be counted only once. 如果单词在文本中出现多次，则只应计算一次。 So far I managed to come to this: 到目前为止，我设法来到这个：

def count_words(text, words):
    count = 0
    text = text.lower()
    for w in words:
        if w in text:
            count =+ 1

    print (count)

count_words("How aresjfhdskfhskd you?", {"how", "are", "you", "hello"})
count_words("Bananas, give me bananas!!!", {"banana", "bananas"})
count_words("Lorem ipsum dolor sit amet, consectetuer adipiscing elit.",
                       {"sum", "hamlet", "infinity", "anything"})

With that code I manage to get final count of 1 for all tree texts and of that only third is ok. 使用该代码，我设法为所有树文本获得最终计数为1，并且只有第三个是正常的。

As I see it, my first problem is that my text.lower() doesn't do anything and I tough it should lower all cases. 正如我所看到的，我的第一个问题是我的text.lower（）没有做任何事情我强硬它应该降低所有情况。

My second problem is that in first case "are" isn't found in "aresjfhdskfhskd" but in third case "sum" is found in "ipsum". 我的第二个问题是，在第一种情况下，“aresjfhdskfhskd”中没有“are”，但在第三种情况下，“sum”在“ipsum”中找到。 Both of that words are part od larger word but first isn't found and second is. 这两个词都是大词的一部分，但首先没有找到，第二个是。 Also, in second case result should be 2 because there are banana and bananas, similar but different. 此外，在第二种情况下结果应该是2因为有香蕉和香蕉，相似但不同。

Thanks in advance. 提前致谢。

Answer 1

Using sum and a generator expression, this seems the simplest solution: 使用sum和生成器表达式，这似乎是最简单的解决方案：

text = text.lower()
count = sum(word in text for word in words)
# bools are cast to ints (0, 1) here

Answer 2

First - strings are immutable, so text.lower() is not changing text itself, but returns new instance - lowercased. 首先 - 字符串是不可变的，因此text.lower()不会更改text本身，而是返回新实例 - lowercased。 Other problem is that if a in base checks if exists, without info how many times... 其他问题是， if a in base检查是否存在，没有信息多少次......

def count_words(text, words):
    count = 0
    lower_text = text.lower()
    for w in words:
        print w + " - " + str(lower_text.count(w))

print "1"
count_words("How aresjfhdskfhskd you?", {"how", "are", "you", "hello"})
print "2"
count_words("Bananas, give me bananas!!!", {"banana", "bananas"})
print "3"
count_words("Lorem ipsum dolor sit amet, consectetuer adipiscing elit.",
                   {"sum", "hamlet", "infinity", "anything"})

Answer 3

Your code is partially wrong. 您的代码部分错误。 Try this: 试试这个：

def count_words(text, words):
    count = 0
    lower_text = text.lower()
    for w in words:
        if w in lower_text:
            count += 1

    print count

count_words("How aresjfhdskfhskd you?", {"how", "are", "you", "hello"})
count_words("Bananas, give me bananas!!!", {"banana", "bananas"})
count_words("Lorem ipsum dolor sit amet, consectetuer adipiscing elit.",
                   {"sum", "hamlet", "infinity", "anything"})

This will only work in Python 2.7 though so if you're using Python 3+ you need to change the final print to print(count) . 这只适用于Python 2.7，但如果您使用的是Python 3+，则需要将最终print更改为print(count) 。

Python - 计算给定文本中的单词

问题描述

3 个解决方案

解决方案1
3 2016-05-13 11:35:17

解决方案2
2 2016-05-13 11:47:35

解决方案3
1 2016-05-13 11:36:10

Python - 计算给定文本中的单词

问题描述

3 个解决方案

解决方案1 3 2016-05-13 11:35:17

解决方案2 2 2016-05-13 11:47:35

解决方案3 1 2016-05-13 11:36:10

解决方案1
3 2016-05-13 11:35:17

解决方案2
2 2016-05-13 11:47:35

解决方案3
1 2016-05-13 11:36:10