简体   繁体   English

如何使用 difflib 通过搜索列表中的元素来返回列表?

[英]How do i use difflib to return a list by searching for an element in the list?

I have a list of lists that looks something like this:我有一个看起来像这样的列表列表:

list123 = [["Title a1","100 Price","Company xx aa"], ["Title b1","200 Price","Company yy bb"], ["Title c1","300 Price","Company zz cc"]]

How do I use difflab.get_close_matches (or something else) to return whole inner list by searching for a specific inner-inner element that matches a search param?如何使用difflab.get_close_matches (或其他)通过搜索与搜索参数匹配的特定内部内部元素来返回整个内部列表?

How I thought it would work:我认为它会如何工作:

print(difflib.get_close_matches('Company xx a', list123))

expected output / output I'd like:预计 output / output 我想:

 ["Title a1","100 Price","Company xx aa"]

Actual output:实际 output:

 []

I'm aware of using something like:我知道使用类似的东西:

for item in list123:
    if "Company xx aa" in item:
        print(item)

But I'd like to use the difflib library(or something else) to allow more "human" searches where small spelling mistakes are allowed.但是我想使用 difflib 库(或其他东西)来允许更多的“人类”搜索,其中允许小的拼写错误。

If I misunderstood the purpose of the function, is there another one that can achieve what I'd like?如果我误解了 function 的用途,还有其他可以实现我想要的吗?

I tried this:我试过这个:

list123 = [["Title a1", "100 Price", "Company xx aa"], ["Title b1",
                                                    "200 Price", "Company yy bb"], ["Title c1", "300 Price", "Cpswdaany zsdwz cawdc"]]
for item in list123:

     print(difflib.get_close_matches("Company xx aa", item))

You will have to tweak the function to specify "How human-readable" it should be.您将不得不调整 function 以指定它应该是“人类可读性如何”。 You might also check this out: Find the closest match between two string variables using difflib你也可以看看这个: Find the mostest match between two string variables using difflib

The problem is that the second parameter of get_closest_matches should be a list of strings, from the documentation :问题是get_closest_matches的第二个参数应该是来自文档的字符串列表:

possibilities is a list of sequences against which to match word (typically a list of strings).可能性是与单词匹配的序列列表(通常是字符串列表)。

To fix your issue, do the following:要解决您的问题,请执行以下操作:

import difflib


def key(choices, keyword='Company xx a'):
    matches = difflib.get_close_matches(keyword, choices)
    if matches:
        best_match, *_ = matches
        return difflib.SequenceMatcher(None, keyword, best_match).ratio()
    return 0.0


list123 = [["Title a1", "100 Price", "Company xx aa"],
           ["Title b1", "200 Price", "Company yy bb"],
           ["Title c1", "300 Price", "Company zz cc"]]

res = max(list123, key=key)

print(res)

Output Output

['Title a1', '100 Price', 'Company xx aa']

The idea is that the key function will return the similarity of the best match of each list, then you can use it in conjunction with max to find the list with the best match.思路是,key function 会返回每个列表的最佳匹配的相似度,然后你可以将它与max结合使用来找到最佳匹配的列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM