[英]Matching 2 words in 2 lines and +1 to the matching pair?
So Ive got a variable list
which is always being fed a new line And variable words
which is a big list of single word strings所以我得到了一个变量
list
,它总是被输入一个新行和变量words
,它是一个单一的单词字符串的大列表
Every time list
updates I want to compare it to words
and see if any strings from words
are in list
If they do match, lets say the word and is in both of them, I then want to print "And: 1"
.每次更新
list
时,我都想将它与words
进行比较,看看words
中的任何字符串是否在list
。如果它们匹配,让我们说出单词并且在它们两个中,然后我想打印"And: 1"
。 Then if next sentence has that as well, to print "And: 2"
, etc. If another word comes in like The I want to print +1 to that然后如果下一个句子也有,打印
"And: 2"
等。如果另一个词像The我想打印 +1
So far I have split the incoming text into an array with text.split()
- unfortunately that is where im stuck.到目前为止,我已经使用
text.split()
将传入的文本拆分为一个数组 - 不幸的是,这就是我卡住的地方。 I do see some use in [x for x in words if x in list]
but dont know how I would use that.我确实在
[x for x in words if x in list]
看到了一些用途,但不知道我将如何使用它。 Also how I would extract the specific word that is matching另外我将如何提取匹配的特定单词
You can use a collections.Counter
object to keep a tally for each of the words that you are tracking.您可以使用
collections.Counter
object 来记录您正在跟踪的每个单词。 To improve performance, use a set
for your word list (you said it's big).为了提高性能,请为您的单词列表使用一
set
(您说它很大)。 To keep things simple assume there is no punctuation in the incoming line data.为简单起见,假设传入的行数据中没有标点符号。 Case is handled by converting all incoming words to lowercase.
通过将所有传入的单词转换为小写来处理大小写。
from collections import Counter
words = {'and', 'the', 'in', 'of', 'had', 'is'} # words to keep counts for
word_counts = Counter()
lines = ['The rabbit and the mole live in the ground',
'Here is a sentence with the word had in it',
'Oh, it also had in in it. AND the and is too']
for line in lines:
tracked_words = [w for word in line.split() if (w:=word.lower()) in words]
word_counts.update(tracked_words)
print(*[f'{word}: {word_counts[word]}'
for word in set(tracked_words)], sep=', ')
Output Output
the: 3, and: 1, in: 1
the: 4, in: 2, is: 1, had: 1
the: 5, and: 3, in: 4, is: 2, had: 2
Basically this code takes a line of input, splits it into words (assuming no punctuation), converts these words to lowercase, and discards any words that are not in the main list of words.基本上,这段代码接受一行输入,将其拆分为单词(假设没有标点符号),将这些单词转换为小写,并丢弃不在主单词列表中的任何单词。 Then the counter is updated.
然后更新计数器。 Finally the current values of the relevant words is printed.
最后打印相关单词的当前值。
This does the trick:这可以解决问题:
sentence = "Hello this is a sentence"
list_of_words = ["this", "sentence"]
dict_of_counts = {} #This will hold all words that have a minimum count of 1.
for word in sentence.split(): #sentence.split() returns a list with each word of the sentence, and we loop over it.
if word in list_of_words:
if word in dict_of_counts: #Check if the current sentence_word is in list_of_words.
dict_of_counts[word] += 1 #If this key already exists in the dictionary, then add one to its value.
else:
dict_of_counts[word] = 1 #If key does not exists, create it with value of 1.
print(f"{word}: {dict_of_counts[word]}") #Print your statement.
The total count is kept in dict_of_counts
and would look like this if you print it: {'this': 1, 'sentence': 1}
总计数保存在
dict_of_counts
中,如果你打印它看起来像这样: {'this': 1, 'sentence': 1}
You should use defaultdict
here for the fastest processing.您应该在此处使用
defaultdict
以获得最快的处理速度。
from collections import defaultdict
input_string = "This is an input string"
list_of_words = ["input", "is"]
counts = defaultdict(int)
for word in input_string.split():
if word in list_of_words:
counts[word] +=1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.