I am experimenting with a basic regex expression as a way of performing a Django filter operation.
I would like to remove any insignificant words from a supplied query string, look for any objects with titles containing any of the remaining words, and then sort starting with those containing the most words.
Using a quick and simplified example:
ignored_words = {'for', 'a', 'of', 'the', 'and', 'to', 'in'}
keywords = []
for word in query.split():
if word not in ignored_words:
keywords.append(word)
if len(keywords) > 0:
regex_str = r'(' + '|'.join(keywords) + ')'
results = MyModel.objects.filter(title__iregex=regex_str)
# Now sort them...
If my query string was 'Delicious Apples and Bananas'
and I had three objects with the following titles:
'Apples'
'Bananas'
'Apples and Bananas'
is there an efficient way I can order my results by the number of keyword occurrences? More specifically, I'm not sure if I should be doing some sort of Count()
operation whilst querying, or looping through the results afterwards and doing some sort of additional regex processing then.
In the end I performed a regex operation on the QuerySet after the filter.
def get_keyword_matches(query, regex):
compiler = re.compile(regex)
result = compiler.findall(query)
return len(result)
results = sorted(results, key=lambda my_object: get_keyword_matches(my_object.title.lower(), regex_str), reverse=True)
If there's a more efficient way of doing this, however, I'd be keen to hear it.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.