简体   繁体   English

如何“松散地”检查字符串是否与列表中的另一个字符串匹配

[英]how to 'loosely' check if string matches another string in list

I have a long list of car ad titles and another list of all car makes and models, I am searching the titles to find a match in the makes/models list.我有一长串汽车广告标题和另一个所有汽车品牌和型号的列表,我正在搜索标题以在品牌/型号列表中找到匹配项。 I have this so far:到目前为止我有这个:

    for make in carmakes:
        if make in title:
            return make

but it doesn't work too well as the titles are human made and come with a lot of variations.但它的效果不太好,因为标题是人造的,并且有很多变化。 For example, if the title is 'Nissan D-Max' and i have 'dmax' in my makes/models list, the loop doesn't catch that as it doesn't match exactly.例如,如果标题是“Nissan D-Max”并且我的品牌/型号列表中有“dmax”,则循环不会捕捉到它,因为它不完全匹配。 What's the best way to 'loosely' or 'dynamically' check for matches? “松散”或“动态”检查匹配的最佳方法是什么?

Once I came across a similar challenge, below is simplified solution:一旦我遇到类似的挑战,以下是简化的解决方案:

import re

def re_compile(*args, flags: int =re.IGNORECASE, **kwargs):
    return re.compile(*args, flags=flags, *kwargs)

class Term(object):
    """"""
    def __init__(self, contain_patterns, *contain_args):
        self.matching_rules = []
        self.forbid_rules = []
        if isinstance(contain_patterns, str):
            self.may_contain(contain_patterns, *contain_args)
        else:
            for cp in contain_patterns:
                self.may_contain(cp, *contain_args)

    def __eq__(self, other):
        return isinstance(other, str) and self.is_alias(other)

    def is_alias(self, s: str):
        return (
            all(not f_rule(s) for f_rule in self.forbid_rules) and
            any(m_rule(s) for m_rule in self.matching_rules)
        )

    def matching_rule(self, f):
        self.matching_rules.append(f)
        return f

    def forbid_rule(self, f):
        self.forbid_rules.append(f)
        return f

    def must_rule(self, f):
        self.forbid_rules.append(lambda s: not f(s))
        return f

    def may_be(self, *re_fullmatch_args):
        self.matching_rules.append(re_compile(*re_fullmatch_args).fullmatch)

    def must_be(self, *re_fullmatch_args):
        fmatch = re_compile(*re_fullmatch_args).fullmatch
        self.forbid_rules.append(lambda s: not fmatch(s))

    def must_not_be(self, *re_fullmatch_args):
        self.forbid_rules.append(re_compile(*re_fullmatch_args).fullmatch)

    def may_contain(self, *re_search_args):
        self.matching_rules.append(re_compile(*re_search_args).search)

    def must_not_contain(self, *re_search_args):
        self.forbid_rules.append(re_compile(*re_search_args).search)

    def may_starts_with(self, *re_match_args):
        self.matching_rules.append(re_compile(*re_match_args).match)

    def must_not_starts_with(self, *re_match_args):
        self.forbid_rules.append(re_compile(*re_match_args).match)

In your case each car_model should be represented as Term instance with self regex rules (I do not know much about car brands, I invented some names):在您的情况下,每个car_model应表示为具有自正则表达式规则的Term实例(我对汽车品牌了解不多,我发明了一些名称):

if __name__ == '__main__':
    dmax = Term((r'd[ -._\'"]?max', r'Nissan DM'))
    dmax.may_contain(r'nissan\s+last\s+(year)?\s*model')
    dmax.must_not_contain(r'Skoda')
    dmax.must_not_contain(r'Volkswagen')

    @dmax.matching_rule
    def dmax_check(s):
        return re.search(r'double\s+max', s, re.IGNORECASE) and re.search(r'nissan', s, re.IGNORECASE)

    tg = Term(r'Tiguan')
    octav = Term(r'Octavia')

    titles = (
        'Dmax model',
        'd_Max nissan',
        'Nissan Double Max Pro',
        'nissan last model',
        'Skoda octavia',
        'skoda d-max',
        'Nissan Qashqai',
        'VW Polo double max'
    )

Your example:你的例子:

for car_model in (dmax, tg, octav):
    print(car_model in titles)

Result:结果:

True
False
True

Details:细节:

print(' '*26, 'DMAX TIGUAN OCTAVIA')
for title in titles:
    print(title.ljust(26), (dmax == title), (tg == title), (octav == title))

Result:结果:

                           DMAX TIGUAN OCTAVIA
Dmax model                 True False False
d_Max nissan               True False False
Nissan Double Max Pro      True False False
nissan last model          True False False
Skoda octavia              False False True
skoda d-max                False False False
Nissan Qashqai             False False False
VW Polo double max         False False False

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何循环 pandas 系列(列表类型)并检查列表中的字符串是否与另一个 df 中的另一个系列匹配? - How to loop a pandas series(list type) and check if the string in list matches with another series in a another df? 如何检查列表中任何 dict 的属性是否与另一个列表中的任何字符串匹配? - How do i check if a property of any dict in a list matches any string from another list? 如何检查字符串是否与字典列表中存储的值匹配 - How to check if a string matches a value stored in a dictionary list 检查是否存在与列表中的字符串匹配的 substring - Check if there is a substring that matches a string from a list 检查字符串是否与 Python 中的正则表达式模式列表匹配 - Check if string matches regex list of patterns in Python 检查字符串中的字符是否与列表的索引范围匹配? - Check is a character in a string matches with a range of indexes of a list? 考虑顺序如何检查列表(字符串)是否包含另一个列表(字符串) - How to check if a list (string) contains another list (string) considering order 如何在列表中找到与另一个列表中的字符串或子字符串匹配的字符串 - How do I find a string in a list that matches a string or substring in another list 如何检查 Python 中列表的另一个字符串是否跟在一个字符串后面? - How can I check if a string is followed by another string of a list in Python? 检查字符串是否匹配模式 - Check if string matches pattern
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM