I'm trying to use the Django SearchVectorField
to support full text search. However, I'm getting different search results when I use the SearchVectorField
on my model vs. instantiating a SearchVector
class in my view. The problem is isolated to an AWS RDS PostgreSQL instance. Both perform the same on my laptop.
Let me try to explain it with some code:
# models.py
class Tweet(models.Model):
def __str__(self):
return self.tweet_id
tweet_id = models.CharField(max_length=25, unique=True)
text = models.CharField(max_length=1000)
text_search_vector = SearchVectorField(null=True, editable=False)
class Meta:
indexes = [GinIndex(fields=['text_search_vector'])]
I've populated all rows with a search vector and have established a trigger on the database to keep the field up to date.
# views.py
query = SearchQuery('chance')
vector = SearchVector('text')
on_the_fly = Tweet.objects.annotate(
rank=SearchRank(vector, query)
).filter(
rank__gte=0.001
)
from_field = Tweet.objects.annotate(
rank=SearchRank(F('text_search_vector'), query)
).filter(
rank__gte=0.001
)
# len(on_the_fly) == 32
# len(from_field) == 0
The on_the_fly
queryset, which uses a SearchVector
instance, returns 32 results. The from_field
queryset, which uses the SearchVectorField
, returns 0 results.
The empty result prompted me to drop into the shell to debug. Here's some output from the command line in my python manage.py shell
environment:
>>> qs = Tweet.objects.filter(
... tweet_id__in=[949763170863865857, 961432484620787712]
... ).annotate(
... vector=SearchVector('text')
... )
>>>
>>> for tweet in qs:
... print(f'Doc text: {tweet.text}')
... print(f'From db: {tweet.text_search_vector}')
... print(f'From qs: {tweet.vector}\n')
...
Doc text: @Espngreeny Run your 3rd and long play and compete for a chance on third down.
From db: '3rd':4 'chanc':12 'compet':9 'espngreeni':1 'long':6 'play':7 'run':2 'third':14
From qs: '3rd':4 'a':11 'and':5,8 'chance':12 'compete':9 'down':15 'espngreeny':1 'for':10 'long':6 'on':13 'play':7 'run':2 'third':14 'your':3
Doc text: No chance. It was me complaining about Girl Scout cookies. <url-removed-for-stack-overflow>
From db: '/aggcqwddbh':13 'chanc':2 'complain':6 'cooki':10 'girl':8 'scout':9 't.co':12 't.co/aggcqwddbh':11
From qs: '/aggcqwddbh':13 'about':7 'chance':2 'complaining':6 'cookies':10 'girl':8 'it':3 'me':5 'no':1 'scout':9 't.co':12 't.co/aggcqwddbh':11 'was':4
You can see that the search vector looks very different when comparing the value from the database to the value that's generated via Django.
Does anyone have any ideas as to why this would happen? Thanks!
SearchQuery translates the terms the user provides into a search query object that the database compares to a search vector. By default, all the words the user provides are passed through the Stemming algorithms , and then it looks for matches for all of the resulting terms. there two issue need to be solved first gave stemming algorithm information about language.
query = SearchQuery('chance' , config="english")
and second is replace this line
rank=SearchRank(F('text_search_vector'), query)
with
rank=SearchRank('text_search_vector', query)
about the missing word in text_search_vector
this is standard procedure of Stemming algorithms to remove common word known as stop word
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.