Matching 2 words in 2 lines and +1 to the matching pair?

Question

So Ive got a variable list which is always being fed a new line And variable words which is a big list of single word strings

Every time list updates I want to compare it to words and see if any strings from words are in list If they do match, lets say the word and is in both of them, I then want to print "And: 1" . Then if next sentence has that as well, to print "And: 2" , etc. If another word comes in like The I want to print +1 to that

So far I have split the incoming text into an array with text.split() - unfortunately that is where im stuck. I do see some use in [x for x in words if x in list] but dont know how I would use that. Also how I would extract the specific word that is matching

Answer 1

You can use a collections.Counter object to keep a tally for each of the words that you are tracking. To improve performance, use a set for your word list (you said it's big). To keep things simple assume there is no punctuation in the incoming line data. Case is handled by converting all incoming words to lowercase.

from collections import Counter

words = {'and', 'the', 'in', 'of', 'had', 'is'}    # words to keep counts for
word_counts = Counter()
lines = ['The rabbit and the mole live in the ground',
         'Here is a sentence with the word had in it',
         'Oh, it also had in in it. AND the and is too']

for line in lines:
    tracked_words = [w for word in line.split() if (w:=word.lower()) in words]
    word_counts.update(tracked_words)
    print(*[f'{word}: {word_counts[word]}'
            for word in set(tracked_words)], sep=', ')

Output

the: 3, and: 1, in: 1
the: 4, in: 2, is: 1, had: 1
the: 5, and: 3, in: 4, is: 2, had: 2

Basically this code takes a line of input, splits it into words (assuming no punctuation), converts these words to lowercase, and discards any words that are not in the main list of words. Then the counter is updated. Finally the current values of the relevant words is printed.

Answer 2

This does the trick:


sentence = "Hello this is a sentence"
list_of_words = ["this", "sentence"]

dict_of_counts = {}                     #This will hold all words that have a minimum count of 1.

for word in sentence.split():           #sentence.split() returns a list with each word of the sentence, and we loop over it.
    if word in list_of_words:
        if word in dict_of_counts:      #Check if the current sentence_word is in list_of_words.
            dict_of_counts[word] += 1   #If this key already exists in the dictionary, then add one to its value.
        else:
            dict_of_counts[word] = 1    #If key does not exists, create it with value of 1.
        print(f"{word}: {dict_of_counts[word]}") #Print your statement.

The total count is kept in dict_of_counts and would look like this if you print it: {'this': 1, 'sentence': 1}

Answer 3

You should use defaultdict here for the fastest processing.

from collections import defaultdict

input_string = "This is an input string"
list_of_words = ["input", "is"]
counts = defaultdict(int)

for word in input_string.split():
    if word in list_of_words:
        counts[word] +=1

Matching 2 words in 2 lines and +1 to the matching pair?

Question

3 answers

solution1
1 2021-02-09 05:18:54

solution2
0 2021-02-09 02:12:48

solution3
0 2021-02-09 02:36:39

Matching 2 words in 2 lines and +1 to the matching pair?

Question

3 answers

solution1 1 2021-02-09 05:18:54

solution2 0 2021-02-09 02:12:48

solution3 0 2021-02-09 02:36:39

solution1
1 2021-02-09 05:18:54

solution2
0 2021-02-09 02:12:48

solution3
0 2021-02-09 02:36:39