简体   繁体   中英

Django DRF elastic search dsl, Apply functional boosting based on another field numerical value

I am trying to tune the relevance of my field search result based on another field numerical value (the field is in another model). Looking at the ES documentation it seems that functional boosts is what I need for my usecase.

Note that I am using thedjango-elasticsearch-dsl ; package.

My use case:

  1. I have a search field where users can search for companies based on the company name.
  2. I want to return the company names that match the query where sorting (relevance) depends on the company's net asset, where the higher the number, the more relevant (field in another model)

Model definition:

class Company(models.Model):
    id = models.AutoField(primary_key=True)
    name = models.CharField(max_length=255, blank=False, null=False)

    def __str__(self):
        return self.name

class CompanyNetAsset(models.Model):
    id = models.AutoField(primary_key=True)
    assets = models.IntegerField
    company_id = models.ForeignKey('Company', on_delete=models.PROTECT, blank=True, null=True)

    def __str__(self):
        return self.name

my es document:

...
custom_stop_words = token_filter(
    'custom_stopwords',
    type='stop',
    ignore_case=True,
    stopwords=['the', 'and']

)


html_strip = analyzer(
    'html_strip',
    tokenizer="standard",
    filter=["lowercase", "asciifolding", custom_stop_words],
    char_filter=["html_strip"],
)


@INDEX.doc_type
class CompanyDocument(Document):
    id = fields.IntegerField(attr='id')

    name = fields.TextField(
        analyzer=html_strip,
        fields={
            'raw': fields.TextField(analyzer='keyword'),
        }
    )

    class Django:
        model = Company

and here is the DocumentViewSet:

class CompanyDocumentViewSet(DocumentViewSet):
    """The Company Document view."""
    serializer_class = CompanyDocumentSerializer
    lookup_field = 'id'
    document = CompanyDocument
    filter_backends = [
        FilteringFilterBackend,
        SearchFilterBackend,
    ]
    search_fields = (
        'name'
    )

    filter_fields = {
        'id': None,
        'name': 'name.raw',

    }

Any idea how I can achieve this using the drf ES package?

UPDATE

here's an example query:

 /api/v1/employers/companies/?search=name:foundation%20center

 "results": [
        {
            "id": 469329,
            "name": "THE FOUNDATION CENTER",
            "city": "NEW YORK",
            "state": "NY"
        },
        {
            "id": 323012,
            "name": "OVERTURE CENTER FOUNDATION",
            "city": "MADISON",
            "state": "WI"
        },
        {
            "id": 367286,
            "name": "PEACE CENTER FOUNDATION",
            "city": "GREENVILLE",
            "state": "SC"
        },
       ...

And here is a son output of the document:

{'settings': {'number_of_shards': 1,
  'number_of_replicas': 1,
  'analysis': {'analyzer': {'html_strip': {'tokenizer': 'standard',
     'filter': ['lowercase', 'asciifolding', 'custom_stopwords'],
     'char_filter': ['html_strip'],
     'type': 'custom'}},
   'filter': {'custom_stopwords': {'ignore_case': True,
     'stopwords': ['the', 'and'],
     'type': 'stop'}}}},
 'mappings': {'properties': {'id': {'type': 'integer'},
   'city': {'type': 'text'},
   'state': {'type': 'text'},
   'name': {'analyzer': 'html_strip',
    'fields': {'raw': {'analyzer': 'keyword', 'type': 'text'}},
    'type': 'text'}}}}

ES doesn't support query-time joins so you'll need to replicate the CompanyNetAsset attributes in the Company model to be able to let the assets influence your sorting.

In practical terms, adjust the Company model like so:

class Company(models.Model):
    id = models.AutoField(primary_key=True)
    name = models.CharField(max_length=255, blank=False, null=False)
    assets = models.IntegerField

    def __str__(self):
        return self.name

Then adjust the Document :

@INDEX.doc_type
class CompanyDocument(Document):
    id = fields.IntegerField(attr='id')

    name = fields.TextField(
        analyzer=html_strip,
        fields={
            'raw': fields.TextField(analyzer='keyword'),
        }
    )

    assets = fields.IntegerField(attr='assets')

    class Django:
        model = Company

Finally, reindex your docs and define the sort:

class CompanyDocumentViewSet(DocumentViewSet):
    """The Company Document view."""
    serializer_class = CompanyDocumentSerializer
    lookup_field = 'id'
    document = CompanyDocument
    filter_backends = [
        FilteringFilterBackend,
        SearchFilterBackend,
    ]
    search_fields = (
        'name'
    )

    filter_fields = {
        'id': None,
        'name': 'name.raw',

    }
   
    # Define ordering fields        # <--
    ordering_fields = {
        'assets': None
    }

    # Specify default ordering      
    ordering = ('assets')

To enforce the sort order via the URI , run:

GET /api/v1/employers/companies/?search=name:foundation%20center&ordering=-assets

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM