[英]Django Regex Field Lookup - Sort by Number of Matches for Each Item in QuerySet
I am experimenting with a basic regex expression as a way of performing a Django filter operation. 我正在尝试使用基本的regex表达式作为执行Django过滤器操作的方式。
I would like to remove any insignificant words from a supplied query string, look for any objects with titles containing any of the remaining words, and then sort starting with those containing the most words. 我想从提供的查询字符串中删除所有无关紧要的单词,查找标题包含任何剩余单词的对象,然后从包含最多单词的对象开始进行排序。
Using a quick and simplified example: 使用简单的示例:
ignored_words = {'for', 'a', 'of', 'the', 'and', 'to', 'in'}
keywords = []
for word in query.split():
if word not in ignored_words:
keywords.append(word)
if len(keywords) > 0:
regex_str = r'(' + '|'.join(keywords) + ')'
results = MyModel.objects.filter(title__iregex=regex_str)
# Now sort them...
If my query string was 'Delicious Apples and Bananas'
and I had three objects with the following titles: 如果我的查询字符串是
'Delicious Apples and Bananas'
并且我有三个具有以下标题的对象:
'Apples'
'Bananas'
'Apples and Bananas'
is there an efficient way I can order my results by the number of keyword occurrences? 有没有一种有效的方法可以按关键字出现的次数对结果进行排序? More specifically, I'm not sure if I should be doing some sort of
Count()
operation whilst querying, or looping through the results afterwards and doing some sort of additional regex processing then. 更具体地说,我不确定是否应该在查询时进行某种
Count()
操作,还是在之后遍历结果并随后进行某种其他正则表达式处理。
In the end I performed a regex operation on the QuerySet after the filter. 最后,我在过滤器之后对QuerySet执行了正则表达式操作。
def get_keyword_matches(query, regex):
compiler = re.compile(regex)
result = compiler.findall(query)
return len(result)
results = sorted(results, key=lambda my_object: get_keyword_matches(my_object.title.lower(), regex_str), reverse=True)
If there's a more efficient way of doing this, however, I'd be keen to hear it. 但是,如果有更有效的方法可以做到这一点,我很想听听。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.