简体   繁体   English

我可以使用什么来在两个列表中查找名称单词? Python

[英]What can I use for finding names words in two list? Python

I am interested in the finding of the same words in two lists.我有兴趣在两个列表中找到相同的单词。 I have two lists of words in the text_list I also stemmed the words.我在 text_list 中有两个单词列表,我也词干了这些单词。

text_list = [['i', 'am', 'interest' ,'for', 'this', 'subject'], ['this', 'is', 'a', 'second', 'sentence']]
words_list = ['a', 'word', 'sentence', 'interesting']

So I need this output:所以我需要这个 output:

same_words= ['a', 'sentence', 'interest']

You need to apply stemming to both the lists, There are discrepancies for example interesting and interest and if you apply stemming to only words_list then Sentence becomes sentenc so, therefore, apply stemmer to both the lists and then find the common elements:您需要对两个列表都应用词干提取,例如interestinginterest之间存在差异,如果您仅对words_list应用词干提取,则Sentence变为sentenc ,因此,将词干分析器应用于两个列表,然后找到共同元素:

from nltk.stem import PorterStemmer

text_list = [['i', 'am', 'interest','for', 'this', 'subject'], ['this', 'is', 'a', 'second', 'sentence']]
words_list = ['a', 'word', 'sentence', 'interesting']

ps = PorterStemmer()
words_list = [ps.stem(w) for w in words_list]
text_list = [list(map(ps.stem,i)) for i in text_list]

answer = []
for i in text_list:
    answer.append(list(set(words_list).intersection(set(i))))

output = sum(answer, [])
print(output)

>>> ['interest', 'a', 'sentenc']

There is a package called fuzzywuzzy which allows you to match the string from a list with the strings from another list with approximation.有一个名为fuzzywuzzy的package,它允许您将列表中的字符串与另一个列表中的字符串进行近似匹配。

First of all, you will need to flatten your nested list to a list/set with unique strings.首先,您需要将嵌套列表展平为具有唯一字符串的列表/集。

from itertools import chain
newset =  set(chain(*text_list))

{'sentence', 'i', 'interest', 'am', 'is', 'for', 'a', 'second', 'subject', 'this'}

Next, from the fuzzywuzzy package, we import the fuzz function.接下来,从fuzzywuzzy package,我们导入fuzz function。

from fuzzywuzzy import fuzz

result = [max([(fuzz.token_set_ratio(i,j),j) for j in newset]) for i in words_list]

[(100, 'a'), (57, 'for'), (100, 'sentence'), (84, 'interest')]

by looking at here, the fuzz.token_set_ratio actually helps you to match the every element from the words_list to all the elements in newset and gives the percentage of matching alphabets between the two elements.通过查看此处, fuzz.token_set_ratio实际上可以帮助您将 words_list 中的每个元素与words_list中的所有元素进行newset ,并给出两个元素之间匹配字母的百分比。 You can remove the max to see the full list of it.您可以删除max以查看它的完整列表。 (Some alphabets in for is in the word , that's why it's shown in this tuple list too with 57% of matching. You can later use a for loop and a percentage tolerance to remove those matches below the percentage tolerance) for中的一些字母在word中,这就是为什么它在这个元组列表中也显示了 57% 的匹配。稍后您可以使用 for 循环和百分比容差来删除低于百分比容差的匹配)

Finally, you will use map to get your desired output.最后,您将使用map来获得您想要的 output。

similarity_score, fuzzy_match = map(list,zip(*result))

fuzzy_match
Out[40]: ['a', 'for', 'sentence', 'interest']

Extra额外的

If your input is not the usual ASCII standard, you can put another argument in the fuzz.token_set_ratio如果您的输入不是通常的 ASCII 标准,您可以在fuzz.token_set_ratio中放置另一个参数

a = ['У', 'вас', 'є', 'чашка', 'кави?']

b = ['ви']

[max([(fuzz.token_set_ratio(i, j, force_ascii= False),j) for j in a]) for i in b]
Out[9]: [(67, 'кави?')]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么给我“列表索引超出范围”? 我正在尝试在列表中查找名称,并且名称的长度可以为两个或更多 - Why is this giving me “list index out of range”? I'm trying to find names in a list and names can be two words long or more 连接两个变量名以自动查找列表python中的内容 - Concatenating two variable names to automate finding something in list python 如何在python中连接两个单词? - How can I join two words in python? 查找列表中两个单词之间的路径是否存在 - Finding if a path between two words in a list exists Python:在关键字之后找到两个单词 - Python: finding the two words following a key word 在Python中查找列表中所有单词的字符数 - Finding the amount of characters of all words in a list in Python 在两个单词之间找到一个共同的字母,找不到我的代码中的错误 - Finding a common letter between two words, Can't find what's the bug in my code 使用python从单词列表中找到长度为2的单词的所有组合 - Finding all combinations of words of length 2 from list of words using python 如何在python中获取文件名列表并将每个文件名分配为数字以供以后使用? - How can I take a list of file names in python and assign each file name as a number for later use? 在python中查找随机输入字母的单词。 那里使用什么算法/代码? - Finding words from random input letters in python. What algorithm to use/code already there?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM