[英]compare words in two lists in python
I would appreciate someone's help on this probably simple matter: I have a long list of words in the form ['word', 'another', 'word', 'and', 'yet', 'another']
. 我很感激有人帮助解决这个问题很简单:我有一长串的单词形式['word', 'another', 'word', 'and', 'yet', 'another']
。 I want to compare these words to a list that I specify, thus looking for target words whether they are contained in the first list or not. 我想将这些单词与我指定的列表进行比较,从而查找目标单词是否包含在第一个列表中。
I would like to output which of my "search" words are contained in the first list and how many times they appear. 我想输出哪些“搜索”单词包含在第一个列表中以及它们出现的次数。 I tried something like list(set(a).intersection(set(b)))
- but it splits up the words and compares letters instead. 我尝试了类似list(set(a).intersection(set(b)))
- 但是它将单词分开并比较字母。
How can I write in a list of words to compare with the existing long list? 如何在一个单词列表中写入与现有的长列表进行比较? And how can I output co-occurences and their frequencies? 我怎样才能输出同时出现的频率? Thank you so much for your time and help. 非常感谢你的时间和帮助。
>>> lst = ['word', 'another', 'word', 'and', 'yet', 'another']
>>> search = ['word', 'and', 'but']
>>> [(w, lst.count(w)) for w in set(lst) if w in search]
[('and', 1), ('word', 2)]
This code basically iterates through the unique elements of lst
, and if the element is in the search
list, it adds the word, along with the number of occurences, to the resulting list. 此代码基本上遍历lst
的唯一元素,如果元素在search
列表中,它会将单词以及出现的数量添加到结果列表中。
Preprocess your list of words with a Counter
: 使用Counter
预处理您的单词列表:
from collections import Counter
a = ['word', 'another', 'word', 'and', 'yet', 'another']
c = Counter(a)
# c == Counter({'word': 2, 'another': 2, 'and': 1, 'yet': 1})
Now you can iterate over your new list of words and check whether they are contained within this Counter-dictionary and the value gives you their number of appearance in the original list: 现在,您可以遍历新的单词列表并检查它们是否包含在此Counter-dictionary中,并且该值会在原始列表中显示它们的出现次数:
words = ['word', 'no', 'another']
for w in words:
print w, c.get(w, 0)
which prints: 打印:
word 2
no 0
another 2
or output it in a list: 或者在列表中输出:
[(w, c.get(w, 0)) for w in words]
# returns [('word', 2), ('no', 0), ('another', 2)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.