Django Regex字段查找-按QuerySet中每个项目的匹配数排序

Question

I am experimenting with a basic regex expression as a way of performing a Django filter operation. 我正在尝试使用基本的regex表达式作为执行Django过滤器操作的方式。

I would like to remove any insignificant words from a supplied query string, look for any objects with titles containing any of the remaining words, and then sort starting with those containing the most words. 我想从提供的查询字符串中删除所有无关紧要的单词，查找标题包含任何剩余单词的对象，然后从包含最多单词的对象开始进行排序。

Using a quick and simplified example: 使用简单的示例：

ignored_words = {'for', 'a', 'of', 'the', 'and', 'to', 'in'}

keywords = []
for word in query.split():
    if word not in ignored_words:
        keywords.append(word)

if len(keywords) > 0:
    regex_str = r'(' + '|'.join(keywords) + ')'
    results = MyModel.objects.filter(title__iregex=regex_str)
    # Now sort them...

If my query string was 'Delicious Apples and Bananas' and I had three objects with the following titles: 如果我的查询字符串是'Delicious Apples and Bananas'并且我有三个具有以下标题的对象：

'Apples'
'Bananas'
'Apples and Bananas'

is there an efficient way I can order my results by the number of keyword occurrences? 有没有一种有效的方法可以按关键字出现的次数对结果进行排序？ More specifically, I'm not sure if I should be doing some sort of Count() operation whilst querying, or looping through the results afterwards and doing some sort of additional regex processing then. 更具体地说，我不确定是否应该在查询时进行某种Count()操作，还是在之后遍历结果并随后进行某种其他正则表达式处理。

Answer 1

In the end I performed a regex operation on the QuerySet after the filter. 最后，我在过滤器之后对QuerySet执行了正则表达式操作。

def get_keyword_matches(query, regex):
    compiler = re.compile(regex)
    result = compiler.findall(query)
    return len(result)

results = sorted(results, key=lambda my_object: get_keyword_matches(my_object.title.lower(), regex_str), reverse=True)

If there's a more efficient way of doing this, however, I'd be keen to hear it. 但是，如果有更有效的方法可以做到这一点，我很想听听。

Django Regex字段查找-按QuerySet中每个项目的匹配数排序

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-12-17 12:41:12

Django Regex字段查找-按QuerySet中每个项目的匹配数排序

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-12-17 12:41:12

解决方案1
0 已采纳 2018-12-17 12:41:12