简体   繁体   English

在 2 行中匹配 2 个单词并对匹配对 +1?

[英]Matching 2 words in 2 lines and +1 to the matching pair?

So Ive got a variable list which is always being fed a new line And variable words which is a big list of single word strings所以我得到了一个变量list ,它总是被输入一个新行和变量words ,它是一个单一的单词字符串的大列表

Every time list updates I want to compare it to words and see if any strings from words are in list If they do match, lets say the word and is in both of them, I then want to print "And: 1" .每次更新list时,我都想将它与words进行比较,看看words中的任何字符串是否在list 。如果它们匹配,让我们说出单词并且在它们两个中,然后我想打印"And: 1" Then if next sentence has that as well, to print "And: 2" , etc. If another word comes in like The I want to print +1 to that然后如果下一个句子也有,打印"And: 2"等。如果另一个词像The我想打印 +1

So far I have split the incoming text into an array with text.split() - unfortunately that is where im stuck.到目前为止,我已经使用text.split()将传入的文本拆分为一个数组 - 不幸的是,这就是我卡住的地方。 I do see some use in [x for x in words if x in list] but dont know how I would use that.我确实在[x for x in words if x in list]看到了一些用途,但不知道我将如何使用它。 Also how I would extract the specific word that is matching另外我将如何提取匹配的特定单词

You can use a collections.Counter object to keep a tally for each of the words that you are tracking.您可以使用collections.Counter object 来记录您正在跟踪的每个单词。 To improve performance, use a set for your word list (you said it's big).为了提高性能,请为您的单词列表使用一set (您说它很大)。 To keep things simple assume there is no punctuation in the incoming line data.为简单起见,假设传入的行数据中没有标点符号。 Case is handled by converting all incoming words to lowercase.通过将所有传入的单词转换为小写来处理大小写。

from collections import Counter

words = {'and', 'the', 'in', 'of', 'had', 'is'}    # words to keep counts for
word_counts = Counter()
lines = ['The rabbit and the mole live in the ground',
         'Here is a sentence with the word had in it',
         'Oh, it also had in in it. AND the and is too']

for line in lines:
    tracked_words = [w for word in line.split() if (w:=word.lower()) in words]
    word_counts.update(tracked_words)
    print(*[f'{word}: {word_counts[word]}'
            for word in set(tracked_words)], sep=', ')

Output Output

the: 3, and: 1, in: 1
the: 4, in: 2, is: 1, had: 1
the: 5, and: 3, in: 4, is: 2, had: 2

Basically this code takes a line of input, splits it into words (assuming no punctuation), converts these words to lowercase, and discards any words that are not in the main list of words.基本上,这段代码接受一行输入,将其拆分为单词(假设没有标点符号),将这些单词转换为小写,并丢弃不在主单词列表中的任何单词。 Then the counter is updated.然后更新计数器。 Finally the current values of the relevant words is printed.最后打印相关单词的当前值。

This does the trick:这可以解决问题:


sentence = "Hello this is a sentence"
list_of_words = ["this", "sentence"]

dict_of_counts = {}                     #This will hold all words that have a minimum count of 1.

for word in sentence.split():           #sentence.split() returns a list with each word of the sentence, and we loop over it.
    if word in list_of_words:
        if word in dict_of_counts:      #Check if the current sentence_word is in list_of_words.
            dict_of_counts[word] += 1   #If this key already exists in the dictionary, then add one to its value.
        else:
            dict_of_counts[word] = 1    #If key does not exists, create it with value of 1.
        print(f"{word}: {dict_of_counts[word]}") #Print your statement. 

The total count is kept in dict_of_counts and would look like this if you print it: {'this': 1, 'sentence': 1}总计数保存在dict_of_counts中,如果你打印它看起来像这样: {'this': 1, 'sentence': 1}

You should use defaultdict here for the fastest processing.您应该在此处使用defaultdict以获得最快的处理速度。

from collections import defaultdict

input_string = "This is an input string"
list_of_words = ["input", "is"]
counts = defaultdict(int)

for word in input_string.split():
    if word in list_of_words:
        counts[word] +=1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM