简体   繁体   English

Python:如何找出列表中句子的出现次数

[英]Python: how to find out the occurrences of a sentence in a list

I'm writing a function to implement the solution to finding the number of times a word occurs in a list of elements, retrieved from a text file which is pretty straightforward to achieve. 我正在编写一个函数来实现解决方案,以查找单词列表中出现的单词的次数,从文本文件中检索,这非常简单。

However, I have been at it for two days trying to figure out how to check occurrences of a string which contains multiple words, can be two or more 但是,我已经在它两天试图弄清楚如何检查包含多个单词的字符串的出现,可以是两个或更多

So for example say the string is: 例如,比如说字符串是:

"hello bye"

and the list is: 并且列表是:

["car", "hello","bye" ,"hello"]

The function should return the value 1 because the elements "hello" and "bye" only occur once consecutively. 该函数应返回值1因为元素“hello”和“bye”仅连续出现一次。


The closest I've gotten to the solution is using 我最接近解决方案的是使用

words[0:2] = [' '.join(words[0:2])]

which would join two elements together given the index. 在给定索引的情况下将两个元素连接在一起。 This however is wrong as the input given will be the element itself rather than an index. 然而这是错误的,因为给定的输入将是元素本身而不是索引。

Can someone point me to the right direction? 有人能指出我正确的方向吗?

Match the string with the join of the consecutive elements in the main list. 将字符串与主列表中连续元素的连接匹配。 Below is the sample code: 以下是示例代码:

my_list = ["car", "hello","bye" ,"hello"]
sentence = "hello bye"
word_count = len(sentence.split())
c = 0

for i in range(len(my_list) - word_count + 1):
    if sentence == ' '.join(my_list[i:i+word_count]):
        c+=1

Final value hold by c will be: c持有的最终价值将是:

>>> c
1

If you are looking for a one-liner , you may use zip and sum as: 如果您正在寻找单线 ,您可以使用zipsum作为:

>>> my_list = ["car", "hello","bye" ,"hello"]
>>> sentence = "hello bye"
>>> words = sentence.split()

>>> sum(1 for i in zip(*[my_list[j:] for j in range(len(words))]) if list(i) == words)
1

Let's split this problem in two parts. 我们将这个问题分成两部分。 First, we establish a function that will return ngrams of a given list, that is sublists of n consecutive elements: 首先,我们建立一个函数,它将返回给定列表的ngrams ,即n个连续元素的子列表:

def ngrams(l, n):
    return list(zip(*[l[i:] for i in range(n)]))

We can now get 2, 3 or 4-grams easily: 我们现在可以轻松获得2,3或4克:

>>> ngrams(["car", "hello","bye" ,"hello"], 2)
[('car', 'hello'), ('hello', 'bye'), ('bye', 'hello')]
>>> ngrams(["car", "hello","bye" ,"hello"], 3)
[('car', 'hello', 'bye'), ('hello', 'bye', 'hello')]
>>> ngrams(["car", "hello","bye" ,"hello"], 4)
[('car', 'hello', 'bye', 'hello')]

Each item is made into a tuple. 每个项目都被制成一个元组。

Now make the phrase 'hello bye' into a tuple: 现在将'hello bye'这个短语变成一个元组:

>>> as_tuple = tuple('hello bye'.split())
>>> as_tuple
('hello', 'bye')
>>> len(as_tuple)
2

Since this has 2 words, we need to generate bigrams from the sentence, and count the number of matching bigrams. 由于这有2个单词,我们需要从句子中生成双字母组,并计算匹配的双字母组的数量。 We can generalize all this to 我们可以将这一切概括为

def ngrams(l, n):
    return list(zip(*[l[i:] for i in range(n)]))

def count_occurrences(sentence, phrase):
    phrase_as_tuple = tuple(phrase.split())
    sentence_ngrams = ngrams(sentence, len(phrase_as_tuple))
    return sentence_ngrams.count(phrase_as_tuple)

print(count_occurrences(["car", "hello","bye" ,"hello"], 'hello bye'))
# prints 1

Two possibilities. 两种可能性。

## laboriously

lookFor = 'hello bye'
words = ["car", "hello","bye" ,"hello", 'tax', 'hello', 'horn', 'hello', 'bye']

strungOutWords = ' '.join(words)

count = 0
p = 0
while True:
    q = strungOutWords [p:].find(lookFor)
    if q == -1:
        break
    else:
        p = p + q + 1
        count += 1

print (count)

## using a regex

import re
print (len(re.compile(lookFor).findall(strungOutWords)))

I would suggest reducing the problem into counting occurrences of a string within another string. 我建议将问题减少到计算另一个字符串中字符串的出现次数。

words = ["hello", "bye", "hello", "car", "hello ", "bye me", "hello", "carpet", "shoplifter"]
sentence = "hello bye"
my_text = " %s " % " ".join([item for sublist in [x.split() for x in words] for item in sublist])


def count(sentence):
    my_sentence = " %s " % " ".join(sentence.split())
    return my_text.count(my_sentence)


print count("hello bye")
>>> 2
print count("pet shop")
>>> 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM