简体   繁体   English

比较python中两个列表中的单词

[英]compare words in two lists in python

I would appreciate someone's help on this probably simple matter: I have a long list of words in the form ['word', 'another', 'word', 'and', 'yet', 'another'] . 我很感激有人帮助解决这个问题很简单:我有一长串的单词形式['word', 'another', 'word', 'and', 'yet', 'another'] I want to compare these words to a list that I specify, thus looking for target words whether they are contained in the first list or not. 我想将这些单词与我指定的列表进行比较,从而查找目标单词是否包含在第一个列表中。

I would like to output which of my "search" words are contained in the first list and how many times they appear. 我想输出哪些“搜索”单词包含在第一个列表中以及它们出现的次数。 I tried something like list(set(a).intersection(set(b))) - but it splits up the words and compares letters instead. 我尝试了类似list(set(a).intersection(set(b))) - 但是它将单词分开并比较字母。

How can I write in a list of words to compare with the existing long list? 如何在一个单词列表中写入与现有的长列表进行比较? And how can I output co-occurences and their frequencies? 我怎样才能输出同时出现的频率? Thank you so much for your time and help. 非常感谢你的时间和帮助。

>>> lst = ['word', 'another', 'word', 'and', 'yet', 'another']
>>> search = ['word', 'and', 'but']
>>> [(w, lst.count(w)) for w in set(lst) if w in search]
[('and', 1), ('word', 2)]

This code basically iterates through the unique elements of lst , and if the element is in the search list, it adds the word, along with the number of occurences, to the resulting list. 此代码基本上遍历lst的唯一元素,如果元素在search列表中,它会将单词以及出现的数量添加到结果列表中。

Preprocess your list of words with a Counter : 使用Counter预处理您的单词列表:

from collections import Counter
a = ['word', 'another', 'word', 'and', 'yet', 'another']
c = Counter(a)
# c == Counter({'word': 2, 'another': 2, 'and': 1, 'yet': 1})

Now you can iterate over your new list of words and check whether they are contained within this Counter-dictionary and the value gives you their number of appearance in the original list: 现在,您可以遍历新的单词列表并检查它们是否包含在此Counter-dictionary中,并且该值会在原始列表中显示它们的出现次数:

words = ['word', 'no', 'another']

for w in words:
    print w, c.get(w, 0)

which prints: 打印:

word 2
no 0
another 2

or output it in a list: 或者在列表中输出:

[(w, c.get(w, 0)) for w in words]
# returns [('word', 2), ('no', 0), ('another', 2)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM